You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@mynewt.apache.org by Aditya Xavier <ad...@me.com.INVALID> on 2018/08/31 10:47:26 UTC

Mynewt crash when releasing semaphore

Hi !

Am having an issue with Sending and Receiving a Mesh Message. Though am positive the problem is more towards releasing the semaphore.

Action Received over MESH Length :- 15
012273 Unhandled interrupt (3), exception sp 0x2000abd0
012273  r0:0xd7229882  r1:0xd929b3bb  r2:0xcf0f98cb  r3:0x5c5a76b3
012273  r4:0x1b000000  r5:0x2000acc0  r6:0x2000aca0  r7:0x00000008
012273  r8:0x00000000  r9:0x200008a9 r10:0x2000ad28 r11:0x00011d91
012273 r12:0x681af5c8  lr:0xb1334673  pc:0x7e3cdeb8 psr:0x2266a80b
012273 ICSR:0x00411803 HFSR:0x40000000 CFSR:0x00040000
012273 BFAR:0xe000ed38 MMFAR:0xe000ed34

Am sending a group mesh message for testing. The sequence of events are as follows.

Button TASK -> send message over MESH -> Mesh receives message on model -> copies the data and starts releases the Semaphore for another task -> LOG Task starts and logs the message.

In this entire flow, the moment I receive the message and release the semaphore the firmware crashes.

I tried increasing the STACK size of the LOG task, however that didn’t help.

Could someone let me know how to understand where / why the crash is happening ?

Thanks,
Aditya Xavier.

Re: Mynewt crash when releasing semaphore

Posted by marko kiiskila <ma...@runtime.io>.

it’s easiest to inspect these addresses with gdb :)

arm-none-eabi-gdb bin/targets/……. .elf

and then start feeding those addresses to see which ones look likely to be part
of callchain.

x/i 0x0003b4d8
x/i 0x000246a7
x/i 0x0003b4d8
etc

> On Aug 31, 2018, at 3:30 PM, Aditya Xavier <ad...@me.com.INVALID> wrote:
> 
> Am really bad at GDB. Also its like a rabbit hole :)
> 
> I ported over my application with the git version of Mynewt-core, and enabled OS_CRASH_STACKTRACE.
> 
> With it enabled, the following is the dump.
> 
> #mesh-onoff STATUS: Sent !
> Action Received over MESH Length :- 14
> 000486 Unhandled interrupt (3), exception sp 0x2000aba0
> 000486  r0:0xcf0f98cb  r1:0x5c5a76b3  r2:0x681af5c8  r3:0xb1334673
> 000486  r4:0x2000ac68  r5:0x00000007  r6:0x00000000  r7:0x200008a9
> 000486  r8:0x2000acf0  r9:0x00012101 r10:0xd7229882 r11:0xd929b3bb
> 000486 r12:0x7e3cdeb8  lr:0x2266a80b  pc:0x59d8de5b psr:0xe8eb9828
> 000486 ICSR:0x00421803 HFSR:0x40000000 CFSR:0x00040000
> 000486 BFAR:0xe000ed38 MMFAR:0xe000ed34
> 000486 task:DECODE_TASK
> 000486  0x2000abec: 0x0003b4d8
> 000486  0x2000abf4: 0x000246a7
> 000486  0x2000ac04: 0x0003b4d8
> 000486  0x2000ac0c: 0x0002488d
> 000486  0x2000ac4c: 0x00012101
> 000486  0x2000ad0c: 0x0000c1e7
> 000486  0x2000ad1c: 0x0000c1e7
> 000486  0x2000ad2c: 0x0000c211
> 000486  0x2000ad30: 0x0003ad44
> 000486  0x2000ad3c: 0x00013023
> 000486  0x2000ad58: 0x000238e1
> 000486  0x2000ad60: 0x00037f81
> 000486  0x2000ad6c: 0x00023a79
> 000486  0x2000ad70: 0x00039b80
> 000486  0x2000ad74: 0x00039b7f
> 000486  0x2000ad84: 0x00023587
> 000486  0x2000ada8: 0x000087cd
> 000486  0x2000adc4: 0x0000d51d
> 000486  0x2000adc8: 0x0000d51c
> 000486  0x2000add8: 0x000398cd
> 000486  0x2000ade4: 0x000087e9
> 000486  0x2000ae08: 0x00010001
> 000486  0x2000ae0c: 0x0001c239
> 000486  0x2000ae10: 0x0003b35c
> 000486  0x2000ae1c: 0x00020001
> 000486  0x2000ae20: 0x0001c38d
> 000486  0x2000ae30: 0x00030001
> 000486  0x2000ae34: 0x0001c509
> 000486  0x2000ae48: 0x0001c38d
> 000486  0x2000ae5c: 0x0001c509
> 000486  0x2000ae70: 0x0001c239
> 000486  0x2000ae74: 0x0003b37c
> 000486  0x2000ae84: 0x0001c38d
> 000486  0x2000ae98: 0x0001c509
> 000486  0x2000aeac: 0x0001c54d
> 000486  0x2000aec0: 0x0001c239
> 000486  0x2000aec4: 0x0003ba28
> 000486  0x2000aed4: 0x0001c38d
> 000486  0x2000aee8: 0x0001c509
> 000486  0x2000aefc: 0x0001c38d
> 000486  0x2000af10: 0x0001c509
> 000486  0x2000af24: 0x0001c54d
> 000486  0x2000af38: 0x0001c38d
> 000486  0x2000af4c: 0x0001c509
> 000486  0x2000af60: 0x0001c38d
> 000486  0x2000af74: 0x0001c509
> 000486  0x2000af88: 0x0001c54d
> 000486  0x2000af9c: 0x0001c38d
> 000486  0x2000afb0: 0x0001c509
> 
> 
>> On 31-Aug-2018, at 5:21 PM, marko kiiskila <ma...@runtime.io> wrote:
>> 
>> Some suggestions (inline).
>> 
>>> On Aug 31, 2018, at 2:32 PM, Aditya Xavier <ad...@me.com.INVALID> wrote:
>>> 
>>> Gosh, this doesn’t make much sense to me :(
>>> 
>>> (gdb) monitor go
>>> (gdb) monitor reset
>>> Resetting target
>>> (gdb) c
>>> Continuing.
>>> 
>>> Program received signal SIGTRAP, Trace/breakpoint trap.
>>> hal_system_reset () at repos/apache-mynewt-core/hw/mcu/nordic/nrf52xxx/src/hal_system.c:50
>>> 50	            asm("bkpt");
>>> (gdb) bt
>>> #0  hal_system_reset () at repos/apache-mynewt-core/hw/mcu/nordic/nrf52xxx/src/hal_system.c:50
>>> #1  0x0000bf2e in os_default_irq (tf=0x2000ffc8) at repos/apache-mynewt-core/kernel/os/src/arch/cortex_m4/os_fault.c:170
>>> #2  0x0000da56 in os_default_irq_asm () at repos/apache-mynewt-core/kernel/os/src/arch/cortex_m4/m4/HAL_CM4.s:260
>>> #3  <signal handler called>
>>> #4  0x00000000 in ?? ()
>>> Backtrace stopped: previous frame identical to this frame (corrupt stack?)
>>> (gdb) frame 1
>>> #1  0x0000bf2e in os_default_irq (tf=0x2000ffc8) at repos/apache-mynewt-core/kernel/os/src/arch/cortex_m4/os_fault.c:170
>>> 170	    hal_system_reset();
>>> (gdb) p/x *tf
>>> $1 = {ef = 0x2000abd0, r4 = 0x1b000000, r5 = 0x2000acc0, r6 = 0x2000aca0, r7 = 0x7, r8 = 0x0, r9 = 0x200008a9, r10 = 0x2000ad28, r11 = 0x11d91, lr = 0xfffffffd}
>>> (gdb) p/x *tf->ef
>>> $2 = {r0 = 0xd7229882, r1 = 0xd929b3bb, r2 = 0xcf0f98cb, r3 = 0x5c5a76b3, r12 = 0x681af5c8, lr = 0xb1334673, pc = 0x7e3cdeb8, psr = 0x2266a80b}
>>> (gdb) x/32x 0xd7229882
>>> 0xd7229882:	0x00000000	0x00000000	0x00000000	0x00000000
>>> 0xd7229892:	0x00000000	0x00000000	0x00000000	0x00000000
>>> 0xd72298a2:	0x00000000	0x00000000	0x00000000	0x00000000
>>> 0xd72298b2:	0x00000000	0x00000000	0x00000000	0x00000000
>>> 0xd72298c2:	0x00000000	0x00000000	0x00000000	0x00000000
>>> 0xd72298d2:	0x00000000	0x00000000	0x00000000	0x00000000
>>> 0xd72298e2:	0x00000000	0x00000000	0x00000000	0x00000000
>>> 0xd72298f2:	0x00000000	0x00000000	0x00000000	0x00000000
>>> (gdb) x/32x 0x2000abd0
>>> 0x2000abd0:	0xd7229882	0xd929b3bb	0xcf0f98cb	0x5c5a76b3
>>> 0x2000abe0:	0x681af5c8	0xb1334673	0x7e3cdeb8	0x2266a80b
>>> 0x2000abf0:	0x59d8de5b	0xe8eb9828	0x96d74690	0xb4b1ee9b
>>> 0x2000ac00:	0x95f0cad6	0x7d1b52fe	0xebcc146e	0x5f7dfaf5
>>> 0x2000ac10:	0x62dd2c19	0x1fc67ee7	0xf40a6a89	0xab77907c
>> 
>> ^^^^^ looks bad, especially the top area. Should have dump of registers
>> stored at the time the crash.
>> 
>> 
>>> 0x2000ac20:	0x00000010	0x00039c74	0x2000ad28	0x0002329f
>>> 0x2000ac30:	0xd87c5730	0xa203a288	0x00000010	0x00039c74
>>> 0x2000ac40:	0x2000ad28	0x00023485	0x00000000	0x00000000
>>> (gdb) p &__text
>>> No symbol "__text" in current context.
>>> (gdb)  p &__etext
>>> $3 = (<data variable, no debug info> *) 0x3a9c8
>>> (gdb) p &__text
>>> No symbol "__text" in current context.
>> 
>> This was probably added at the same time as OS_STACK_BACKTRACE.
>> You’re looking for values between start of your image slot and 0x3a9c8.
>> 
>>> (gdb) x/i 0xd7229882
>>> 0xd7229882:	movs	r0, r0
>>> (gdb) list *0xd7229882
>>> (gdb) x/i 0x681af5c8
>>> 0x681af5c8:	movs	r0, r0
>>> (gdb) x/i 0x59d8de5b
>>> 0x59d8de5b:	movs	r0, r0
>>> (gdb) x/i 0x62dd2c19
>>> 0x62dd2c19:	movs	r0, r0
>>> (gdb) x/i 0x2000ad28
>>> 0x2000ad28:	lsls	r0, r2, #6
>>> (gdb) x/i 0x1fc67ee7
>>> 0x1fc67ee7:	movs	r0, r0
>>> (gdb) x/i 0xa203a288
>>> 0xa203a288:	movs	r0, r0
>>> (gdb) x/i 0xe8eb9828
>>> 0xe8eb9828:	movs	r0, r0
>>> (gdb) x/i 0xcf0f98cb
>>> 0xcf0f98cb:	movs	r0, r0
>>> (gdb) x/i 0x96d74690
>>> 0x96d74690:	movs	r0, r0
>>> (gdb) x/i 0xf40a6a89
>>> 0xf40a6a89:	movs	r0, r0
>>> (gdb) x/i 0x2000ad28
>>> 0x2000ad28:	lsls	r0, r2, #6
>>> (gdb) x/i 0x00000010
>>> 0x10:	movs	r0, r0
>>> (gdb) x/i 0x0002329f
>>> 0x2329f <shift_rows+108>:	add	sp, #20
>>> (gdb) x/i 0x00039c74
>>> 0x39c74 <sbox>:	ldrb	r3, [r4, #17]
>>> (gdb) x/i 0xa203a288
>>> 0xa203a288:	movs	r0, r0
>>> (gdb) x/i 0x0002329f
>>> 0x2329f <shift_rows+108>:	add	sp, #20
>>> (gdb) list *0x0002329f
>>> 0x2329f is in shift_rows (repos/apache-mynewt-core/crypto/tinycrypt/src/aes_encrypt.c:156).
>>> 151		t[0]  = s[0]; t[1] = s[5]; t[2] = s[10]; t[3] = s[15];
>>> 152		t[4]  = s[4]; t[5] = s[9]; t[6] = s[14]; t[7] = s[3];
>>> 153		t[8]  = s[8]; t[9] = s[13]; t[10] = s[2]; t[11] = s[7];
>>> 154		t[12] = s[12]; t[13] = s[1]; t[14] = s[6]; t[15] = s[11];
>>> 155		(void) _copy(s, sizeof(t), t, sizeof(t));
>>> 156	}
>>> 157	
>>> 158	int tc_aes_encrypt(uint8_t *out, const uint8_t *in, const TCAesKeySched_t s)
>>> 159	{
>>> 160		uint8_t state[Nk*Nb];
>> 
>> That could be writing that random looking data in the stack. encrypted data should
>> look like gibberish.
>> Follow the stack a bit further starting continuing from 0x2000ac50. See if you
>> find who called it. I’m hazarding a guess that one of those args passed to aes_encrypt()
>> is pointing to stack, and there’s not enough memory allocated to hold that data.
>> 
>> 
>>>> On 31-Aug-2018, at 4:46 PM, marko kiiskila <ma...@runtime.io> wrote:
>>>> 
>>>> Sure. Something like this:
>>>> 
>>>> 000933 compat> crash div0
>>>> crash div0
>>>> 003157 Unhandled interrupt (3), exception sp 0x20001dd8
>>>> 003157  r0:0x00000000  r1:0x00017161  r2:0x00000000  r3:0x0000002a
>>>> 003157  r4:0x200041d6  r5:0x00000000  r6:0x20000318  r7:0x00000000
>>>> 003157  r8:0x00000000  r9:0x00000000 r10:0x00000000 r11:0x00000000
>>>> 003157 r12:0x00000000  lr:0x00014949  pc:0x00014978 psr:0x61000000
>>>> 003157 ICSR:0x00421803 HFSR:0x40000000 CFSR:0x02000000
>>>> 003157 BFAR:0xe000ed38 MMFAR:0xe000ed34
>>>> 
>>>> Then from gdb:
>>>> 
>>>> Program received signal SIGTRAP, Trace/breakpoint trap.
>>>> hal_system_reset ()
>>>> at repos/apache-mynewt-core/hw/mcu/nordic/nrf52xxx/src/hal_system.c:50
>>>> 50	            asm("bkpt");
>>>> (gdb) bt
>>>> #0  hal_system_reset ()
>>>> at repos/apache-mynewt-core/hw/mcu/nordic/nrf52xxx/src/hal_system.c:50
>>>> #1  0x00008be8 in os_default_irq (tf=0x2000ffc0)
>>>> at repos/apache-mynewt-core/kernel/os/src/arch/cortex_m4/os_fault.c:171
>>>> #2  0x0000a5b6 in os_default_irq_asm ()
>>>> at repos/apache-mynewt-core/kernel/os/src/arch/cortex_m4/m4/HAL_CM4.s:260
>>>> #3  <signal handler called>
>>>> #4  0x00000000 in ?? ()
>>>> #5  0x0000812c in Reset_Handler ()
>>>> at repos/apache-mynewt-core/hw/bsp/nrf52dk/src/arch/cortex_m4/gcc_startup_nrf52.s:180
>>>> Backtrace stopped: previous frame identical to this frame (corrupt stack?)
>>>> (gdb) frame 1
>>>> #1  0x00008be8 in os_default_irq (tf=0x2000ffc0)
>>>> at repos/apache-mynewt-core/kernel/os/src/arch/cortex_m4/os_fault.c:171
>>>> 171	    hal_system_reset();
>>>> (gdb) p/x *tf
>>>> $1 = {ef = 0x20001dd8, r4 = 0x200041d6, r5 = 0x0, r6 = 0x20000318, r7 = 0x0, 
>>>> r8 = 0x0, r9 = 0x0, r10 = 0x0, r11 = 0x0, lr = 0xfffffffd}
>>>> (gdb) p/x *tf->ef
>>>> $2 = {r0 = 0x0, r1 = 0x17161, r2 = 0x0, r3 = 0x2a, r12 = 0x0, lr = 0x14949, 
>>>> pc = 0x14978, psr = 0x61000000}
>>>> (gdb) x/32x 0x20001dd8
>>>> 0x20001dd8 <os_main_stack+3896>:	0x00000000	0x00017161	0x00000000	0x0000002a
>>>> 0x20001de8 <os_main_stack+3912>:	0x00000000	0x00014949	0x00014978	0x61000000
>>>> 0x20001df8 <os_main_stack+3928>:	0x00000003	0x00000000	0x00000000	0x0000002a
>>>> 0x20001e08 <os_main_stack+3944>:	0x00000001	0x00000002	0x0000000a	0x00014a21
>>>> 0x20001e18 <os_main_stack+3960>:	0x00014a15	0x0000ebd9	0x00000000	0x200041d0
>>>> 0x20001e28 <os_main_stack+3976>:	0x200041d6	0x00000000	0x0000000a	0x0001574d
>>>> 0x20001e38 <os_main_stack+3992>:	0x00015741	0x0000c925	0x200041d0	0x00000011
>>>> 0x20001e48 <os_main_stack+4008>:	0x00000073	0x200041d3	0x00000000	0x0000ede9
>>>> (gdb) p &__text
>>>> $3 = (<data variable, no debug info> *) 0x8020 <__isr_vector>
>>>> (gdb) p &__etext
>>>> $4 = (<data variable, no debug info> *) 0x175f0
>>>> (gdb) x/i 0x00017161
>>>> 0x17161:	movs	r0, r0
>>>> (gdb) x/i 0x00014949
>>>> 0x14949 <crash_device+12>:	cbz	r0, 0x1496a <crash_device+46>
>>>> (gdb) x/i 0x00014978
>>>> 0x14978 <crash_device+60>:	sdiv	r3, r3, r2
>>>> (gdb) x/i 0x00014a21
>>>> 0x14a21 <crash_cli_cmd+12>:	cbz	r0, 0x14a28 <crash_cli_cmd+20>
>>>> (gdb) x/i 0x00014a15
>>>> 0x14a15 <crash_cli_cmd>:	push	{r3, lr}
>>>> (gdb) list *0x14949
>>>> 0x14949 is in crash_device (repos/apache-mynewt-core/test/crash_test/src/crash_test.c:42).
>>>> warning: Source file is more recent than executable.
>>>> 37	int
>>>> 38	crash_device(char *how)
>>>> 39	{
>>>> 40	    volatile int val1, val2, val3;
>>>> 41	
>>>> 42	    if (!strcmp(how, "div0")) {
>>>> 43	
>>>> 44	        val1 = 42;
>>>> 45	        val2 = 0;
>>>> 46	
>>>> (gdb) list *0x00014a21
>>>> 0x14a21 is in crash_cli_cmd (repos/apache-mynewt-core/test/crash_test/src/crash_cli.c:41).
>>>> 36	};
>>>> 37	
>>>> 38	static int
>>>> 39	crash_cli_cmd(int argc, char **argv)
>>>> 40	{
>>>> 41	    if (argc >= 2 && crash_device(argv[1]) == 0) {
>>>> 42	        return 0;
>>>> 43	    }
>>>> 44	    console_printf("Usage crash [div0|jump0|ref0|assert|wdog]\n");
>>>> 45	    return 0;
>>>> (gdb) list *0x14a21
>>>> 0x14a21 is in crash_cli_cmd (repos/apache-mynewt-core/test/crash_test/src/crash_cli.c:41).
>>>> 36	};
>>>> 37	
>>>> 38	static int
>>>> 39	crash_cli_cmd(int argc, char **argv)
>>>> 40	{
>>>> 41	    if (argc >= 2 && crash_device(argv[1]) == 0) {
>>>> 42	        return 0;
>>>> 43	    }
>>>> 44	    console_printf("Usage crash [div0|jump0|ref0|assert|wdog]\n");
>>>> 45	    return 0;
>>>> 
>>>> good luck.
>>>> 
>>>>> On Aug 31, 2018, at 2:10 PM, Aditya Xavier <ad...@me.com.INVALID> wrote:
>>>>> 
>>>>> It seems OS_CRASH_STACKTRACE was introduced after 1.4.1 and hence not in the release.
>>>>> 
>>>>> If I change the release, I believe there would be many API changes to be done on MESH side.
>>>>> 
>>>>> Can you guide me on how to "manually walk the stack for looking for things which look like pointers to text” ?
>>>>> 
>>>>> My gdb skill are pretty weak.
>>>>> 
>>>>> I tried gdb where, with the following outcome.
>>>>> 
>>>>> (gdb) c
>>>>> Continuing.
>>>>> 
>>>>> 
>>>>> Program received signal SIGTRAP, Trace/breakpoint trap.
>>>>> hal_system_reset () at repos/apache-mynewt-core/hw/mcu/nordic/nrf52xxx/src/hal_system.c:50
>>>>> 50	            asm("bkpt");
>>>>> (gdb) 
>>>>> Continuing.
>>>>> 
>>>>> Program received signal SIGTRAP, Trace/breakpoint trap.
>>>>> hal_system_reset () at repos/apache-mynewt-core/hw/mcu/nordic/nrf52xxx/src/hal_system.c:50
>>>>> 50	            asm("bkpt");
>>>>> (gdb) where
>>>>> #0  hal_system_reset () at repos/apache-mynewt-core/hw/mcu/nordic/nrf52xxx/src/hal_system.c:50
>>>>> #1  0x0000bf2e in os_default_irq (tf=0x2000ffc8) at repos/apache-mynewt-core/kernel/os/src/arch/cortex_m4/os_fault.c:170
>>>>> #2  0x0000da56 in os_default_irq_asm () at repos/apache-mynewt-core/kernel/os/src/arch/cortex_m4/m4/HAL_CM4.s:260
>>>>> #3  <signal handler called>
>>>>> #4  0x00000000 in ?? ()
>>>>> Backtrace stopped: previous frame identical to this frame (corrupt stack?)
>>>>> 
>>>>> 
>>>>> 
>>>>>> On 31-Aug-2018, at 4:30 PM, marko kiiskila <ma...@runtime.io> wrote:
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>>> On Aug 31, 2018, at 1:47 PM, Aditya Xavier <ad...@me.com.INVALID> wrote:
>>>>>>> 
>>>>>>> Hi !
>>>>>>> 
>>>>>>> Am having an issue with Sending and Receiving a Mesh Message. Though am positive the problem is more towards releasing the semaphore.
>>>>>>> 
>>>>>>> Action Received over MESH Length :- 15
>>>>>>> 012273 Unhandled interrupt (3), exception sp 0x2000abd0
>>>>>>> 012273  r0:0xd7229882  r1:0xd929b3bb  r2:0xcf0f98cb  r3:0x5c5a76b3
>>>>>>> 012273  r4:0x1b000000  r5:0x2000acc0  r6:0x2000aca0  r7:0x00000008
>>>>>>> 012273  r8:0x00000000  r9:0x200008a9 r10:0x2000ad28 r11:0x00011d91
>>>>>>> 012273 r12:0x681af5c8  lr:0xb1334673  pc:0x7e3cdeb8 psr:0x2266a80b
>>>>>>> 012273 ICSR:0x00411803 HFSR:0x40000000 CFSR:0x00040000
>>>>>>> 012273 BFAR:0xe000ed38 MMFAR:0xe000ed34
>>>>>>> 
>>>>>>> Am sending a group mesh message for testing. The sequence of events are as follows.
>>>>>>> 
>>>>>>> Button TASK -> send message over MESH -> Mesh receives message on model -> copies the data and starts releases the Semaphore for another task -> LOG Task starts and logs the message.
>>>>>>> 
>>>>>>> In this entire flow, the moment I receive the message and release the semaphore the firmware crashes.
>>>>>>> 
>>>>>>> I tried increasing the STACK size of the LOG task, however that didn’t help.
>>>>>>> 
>>>>>>> Could someone let me know how to understand where / why the crash is happening ?
>>>>>> 
>>>>>> Looking at your registers they seem to be garbage, so I’m guessing stack
>>>>>> corruption of some sort; does not have to be overflow.
>>>>>> Try turning on OS_CRASH_STACKTRACE, or manually walk the stack for looking for things which
>>>>>> look like pointers to text.
>

Re: Mynewt crash when releasing semaphore

Posted by Aditya Xavier <ad...@me.com.INVALID>.

Am really bad at GDB. Also its like a rabbit hole :)

I ported over my application with the git version of Mynewt-core, and enabled OS_CRASH_STACKTRACE.

With it enabled, the following is the dump.

#mesh-onoff STATUS: Sent !
Action Received over MESH Length :- 14
000486 Unhandled interrupt (3), exception sp 0x2000aba0
000486  r0:0xcf0f98cb  r1:0x5c5a76b3  r2:0x681af5c8  r3:0xb1334673
000486  r4:0x2000ac68  r5:0x00000007  r6:0x00000000  r7:0x200008a9
000486  r8:0x2000acf0  r9:0x00012101 r10:0xd7229882 r11:0xd929b3bb
000486 r12:0x7e3cdeb8  lr:0x2266a80b  pc:0x59d8de5b psr:0xe8eb9828
000486 ICSR:0x00421803 HFSR:0x40000000 CFSR:0x00040000
000486 BFAR:0xe000ed38 MMFAR:0xe000ed34
000486 task:DECODE_TASK
000486  0x2000abec: 0x0003b4d8
000486  0x2000abf4: 0x000246a7
000486  0x2000ac04: 0x0003b4d8
000486  0x2000ac0c: 0x0002488d
000486  0x2000ac4c: 0x00012101
000486  0x2000ad0c: 0x0000c1e7
000486  0x2000ad1c: 0x0000c1e7
000486  0x2000ad2c: 0x0000c211
000486  0x2000ad30: 0x0003ad44
000486  0x2000ad3c: 0x00013023
000486  0x2000ad58: 0x000238e1
000486  0x2000ad60: 0x00037f81
000486  0x2000ad6c: 0x00023a79
000486  0x2000ad70: 0x00039b80
000486  0x2000ad74: 0x00039b7f
000486  0x2000ad84: 0x00023587
000486  0x2000ada8: 0x000087cd
000486  0x2000adc4: 0x0000d51d
000486  0x2000adc8: 0x0000d51c
000486  0x2000add8: 0x000398cd
000486  0x2000ade4: 0x000087e9
000486  0x2000ae08: 0x00010001
000486  0x2000ae0c: 0x0001c239
000486  0x2000ae10: 0x0003b35c
000486  0x2000ae1c: 0x00020001
000486  0x2000ae20: 0x0001c38d
000486  0x2000ae30: 0x00030001
000486  0x2000ae34: 0x0001c509
000486  0x2000ae48: 0x0001c38d
000486  0x2000ae5c: 0x0001c509
000486  0x2000ae70: 0x0001c239
000486  0x2000ae74: 0x0003b37c
000486  0x2000ae84: 0x0001c38d
000486  0x2000ae98: 0x0001c509
000486  0x2000aeac: 0x0001c54d
000486  0x2000aec0: 0x0001c239
000486  0x2000aec4: 0x0003ba28
000486  0x2000aed4: 0x0001c38d
000486  0x2000aee8: 0x0001c509
000486  0x2000aefc: 0x0001c38d
000486  0x2000af10: 0x0001c509
000486  0x2000af24: 0x0001c54d
000486  0x2000af38: 0x0001c38d
000486  0x2000af4c: 0x0001c509
000486  0x2000af60: 0x0001c38d
000486  0x2000af74: 0x0001c509
000486  0x2000af88: 0x0001c54d
000486  0x2000af9c: 0x0001c38d
000486  0x2000afb0: 0x0001c509


> On 31-Aug-2018, at 5:21 PM, marko kiiskila <ma...@runtime.io> wrote:
> 
> Some suggestions (inline).
> 
>> On Aug 31, 2018, at 2:32 PM, Aditya Xavier <ad...@me.com.INVALID> wrote:
>> 
>> Gosh, this doesn’t make much sense to me :(
>> 
>> (gdb) monitor go
>> (gdb) monitor reset
>> Resetting target
>> (gdb) c
>> Continuing.
>> 
>> Program received signal SIGTRAP, Trace/breakpoint trap.
>> hal_system_reset () at repos/apache-mynewt-core/hw/mcu/nordic/nrf52xxx/src/hal_system.c:50
>> 50	            asm("bkpt");
>> (gdb) bt
>> #0  hal_system_reset () at repos/apache-mynewt-core/hw/mcu/nordic/nrf52xxx/src/hal_system.c:50
>> #1  0x0000bf2e in os_default_irq (tf=0x2000ffc8) at repos/apache-mynewt-core/kernel/os/src/arch/cortex_m4/os_fault.c:170
>> #2  0x0000da56 in os_default_irq_asm () at repos/apache-mynewt-core/kernel/os/src/arch/cortex_m4/m4/HAL_CM4.s:260
>> #3  <signal handler called>
>> #4  0x00000000 in ?? ()
>> Backtrace stopped: previous frame identical to this frame (corrupt stack?)
>> (gdb) frame 1
>> #1  0x0000bf2e in os_default_irq (tf=0x2000ffc8) at repos/apache-mynewt-core/kernel/os/src/arch/cortex_m4/os_fault.c:170
>> 170	    hal_system_reset();
>> (gdb) p/x *tf
>> $1 = {ef = 0x2000abd0, r4 = 0x1b000000, r5 = 0x2000acc0, r6 = 0x2000aca0, r7 = 0x7, r8 = 0x0, r9 = 0x200008a9, r10 = 0x2000ad28, r11 = 0x11d91, lr = 0xfffffffd}
>> (gdb) p/x *tf->ef
>> $2 = {r0 = 0xd7229882, r1 = 0xd929b3bb, r2 = 0xcf0f98cb, r3 = 0x5c5a76b3, r12 = 0x681af5c8, lr = 0xb1334673, pc = 0x7e3cdeb8, psr = 0x2266a80b}
>> (gdb) x/32x 0xd7229882
>> 0xd7229882:	0x00000000	0x00000000	0x00000000	0x00000000
>> 0xd7229892:	0x00000000	0x00000000	0x00000000	0x00000000
>> 0xd72298a2:	0x00000000	0x00000000	0x00000000	0x00000000
>> 0xd72298b2:	0x00000000	0x00000000	0x00000000	0x00000000
>> 0xd72298c2:	0x00000000	0x00000000	0x00000000	0x00000000
>> 0xd72298d2:	0x00000000	0x00000000	0x00000000	0x00000000
>> 0xd72298e2:	0x00000000	0x00000000	0x00000000	0x00000000
>> 0xd72298f2:	0x00000000	0x00000000	0x00000000	0x00000000
>> (gdb) x/32x 0x2000abd0
>> 0x2000abd0:	0xd7229882	0xd929b3bb	0xcf0f98cb	0x5c5a76b3
>> 0x2000abe0:	0x681af5c8	0xb1334673	0x7e3cdeb8	0x2266a80b
>> 0x2000abf0:	0x59d8de5b	0xe8eb9828	0x96d74690	0xb4b1ee9b
>> 0x2000ac00:	0x95f0cad6	0x7d1b52fe	0xebcc146e	0x5f7dfaf5
>> 0x2000ac10:	0x62dd2c19	0x1fc67ee7	0xf40a6a89	0xab77907c
> 
> ^^^^^ looks bad, especially the top area. Should have dump of registers
> stored at the time the crash.
> 
> 
>> 0x2000ac20:	0x00000010	0x00039c74	0x2000ad28	0x0002329f
>> 0x2000ac30:	0xd87c5730	0xa203a288	0x00000010	0x00039c74
>> 0x2000ac40:	0x2000ad28	0x00023485	0x00000000	0x00000000
>> (gdb) p &__text
>> No symbol "__text" in current context.
>> (gdb)  p &__etext
>> $3 = (<data variable, no debug info> *) 0x3a9c8
>> (gdb) p &__text
>> No symbol "__text" in current context.
> 
> This was probably added at the same time as OS_STACK_BACKTRACE.
> You’re looking for values between start of your image slot and 0x3a9c8.
> 
>> (gdb) x/i 0xd7229882
>>  0xd7229882:	movs	r0, r0
>> (gdb) list *0xd7229882
>> (gdb) x/i 0x681af5c8
>>  0x681af5c8:	movs	r0, r0
>> (gdb) x/i 0x59d8de5b
>>  0x59d8de5b:	movs	r0, r0
>> (gdb) x/i 0x62dd2c19
>>  0x62dd2c19:	movs	r0, r0
>> (gdb) x/i 0x2000ad28
>>  0x2000ad28:	lsls	r0, r2, #6
>> (gdb) x/i 0x1fc67ee7
>>  0x1fc67ee7:	movs	r0, r0
>> (gdb) x/i 0xa203a288
>>  0xa203a288:	movs	r0, r0
>> (gdb) x/i 0xe8eb9828
>>  0xe8eb9828:	movs	r0, r0
>> (gdb) x/i 0xcf0f98cb
>>  0xcf0f98cb:	movs	r0, r0
>> (gdb) x/i 0x96d74690
>>  0x96d74690:	movs	r0, r0
>> (gdb) x/i 0xf40a6a89
>>  0xf40a6a89:	movs	r0, r0
>> (gdb) x/i 0x2000ad28
>>  0x2000ad28:	lsls	r0, r2, #6
>> (gdb) x/i 0x00000010
>>  0x10:	movs	r0, r0
>> (gdb) x/i 0x0002329f
>>  0x2329f <shift_rows+108>:	add	sp, #20
>> (gdb) x/i 0x00039c74
>>  0x39c74 <sbox>:	ldrb	r3, [r4, #17]
>> (gdb) x/i 0xa203a288
>>  0xa203a288:	movs	r0, r0
>> (gdb) x/i 0x0002329f
>>  0x2329f <shift_rows+108>:	add	sp, #20
>> (gdb) list *0x0002329f
>> 0x2329f is in shift_rows (repos/apache-mynewt-core/crypto/tinycrypt/src/aes_encrypt.c:156).
>> 151		t[0]  = s[0]; t[1] = s[5]; t[2] = s[10]; t[3] = s[15];
>> 152		t[4]  = s[4]; t[5] = s[9]; t[6] = s[14]; t[7] = s[3];
>> 153		t[8]  = s[8]; t[9] = s[13]; t[10] = s[2]; t[11] = s[7];
>> 154		t[12] = s[12]; t[13] = s[1]; t[14] = s[6]; t[15] = s[11];
>> 155		(void) _copy(s, sizeof(t), t, sizeof(t));
>> 156	}
>> 157	
>> 158	int tc_aes_encrypt(uint8_t *out, const uint8_t *in, const TCAesKeySched_t s)
>> 159	{
>> 160		uint8_t state[Nk*Nb];
> 
> That could be writing that random looking data in the stack. encrypted data should
> look like gibberish.
> Follow the stack a bit further starting continuing from 0x2000ac50. See if you
> find who called it. I’m hazarding a guess that one of those args passed to aes_encrypt()
> is pointing to stack, and there’s not enough memory allocated to hold that data.
> 
> 
>>> On 31-Aug-2018, at 4:46 PM, marko kiiskila <ma...@runtime.io> wrote:
>>> 
>>> Sure. Something like this:
>>> 
>>> 000933 compat> crash div0
>>> crash div0
>>> 003157 Unhandled interrupt (3), exception sp 0x20001dd8
>>> 003157  r0:0x00000000  r1:0x00017161  r2:0x00000000  r3:0x0000002a
>>> 003157  r4:0x200041d6  r5:0x00000000  r6:0x20000318  r7:0x00000000
>>> 003157  r8:0x00000000  r9:0x00000000 r10:0x00000000 r11:0x00000000
>>> 003157 r12:0x00000000  lr:0x00014949  pc:0x00014978 psr:0x61000000
>>> 003157 ICSR:0x00421803 HFSR:0x40000000 CFSR:0x02000000
>>> 003157 BFAR:0xe000ed38 MMFAR:0xe000ed34
>>> 
>>> Then from gdb:
>>> 
>>> Program received signal SIGTRAP, Trace/breakpoint trap.
>>> hal_system_reset ()
>>>  at repos/apache-mynewt-core/hw/mcu/nordic/nrf52xxx/src/hal_system.c:50
>>> 50	            asm("bkpt");
>>> (gdb) bt
>>> #0  hal_system_reset ()
>>>  at repos/apache-mynewt-core/hw/mcu/nordic/nrf52xxx/src/hal_system.c:50
>>> #1  0x00008be8 in os_default_irq (tf=0x2000ffc0)
>>>  at repos/apache-mynewt-core/kernel/os/src/arch/cortex_m4/os_fault.c:171
>>> #2  0x0000a5b6 in os_default_irq_asm ()
>>>  at repos/apache-mynewt-core/kernel/os/src/arch/cortex_m4/m4/HAL_CM4.s:260
>>> #3  <signal handler called>
>>> #4  0x00000000 in ?? ()
>>> #5  0x0000812c in Reset_Handler ()
>>>  at repos/apache-mynewt-core/hw/bsp/nrf52dk/src/arch/cortex_m4/gcc_startup_nrf52.s:180
>>> Backtrace stopped: previous frame identical to this frame (corrupt stack?)
>>> (gdb) frame 1
>>> #1  0x00008be8 in os_default_irq (tf=0x2000ffc0)
>>>  at repos/apache-mynewt-core/kernel/os/src/arch/cortex_m4/os_fault.c:171
>>> 171	    hal_system_reset();
>>> (gdb) p/x *tf
>>> $1 = {ef = 0x20001dd8, r4 = 0x200041d6, r5 = 0x0, r6 = 0x20000318, r7 = 0x0, 
>>> r8 = 0x0, r9 = 0x0, r10 = 0x0, r11 = 0x0, lr = 0xfffffffd}
>>> (gdb) p/x *tf->ef
>>> $2 = {r0 = 0x0, r1 = 0x17161, r2 = 0x0, r3 = 0x2a, r12 = 0x0, lr = 0x14949, 
>>> pc = 0x14978, psr = 0x61000000}
>>> (gdb) x/32x 0x20001dd8
>>> 0x20001dd8 <os_main_stack+3896>:	0x00000000	0x00017161	0x00000000	0x0000002a
>>> 0x20001de8 <os_main_stack+3912>:	0x00000000	0x00014949	0x00014978	0x61000000
>>> 0x20001df8 <os_main_stack+3928>:	0x00000003	0x00000000	0x00000000	0x0000002a
>>> 0x20001e08 <os_main_stack+3944>:	0x00000001	0x00000002	0x0000000a	0x00014a21
>>> 0x20001e18 <os_main_stack+3960>:	0x00014a15	0x0000ebd9	0x00000000	0x200041d0
>>> 0x20001e28 <os_main_stack+3976>:	0x200041d6	0x00000000	0x0000000a	0x0001574d
>>> 0x20001e38 <os_main_stack+3992>:	0x00015741	0x0000c925	0x200041d0	0x00000011
>>> 0x20001e48 <os_main_stack+4008>:	0x00000073	0x200041d3	0x00000000	0x0000ede9
>>> (gdb) p &__text
>>> $3 = (<data variable, no debug info> *) 0x8020 <__isr_vector>
>>> (gdb) p &__etext
>>> $4 = (<data variable, no debug info> *) 0x175f0
>>> (gdb) x/i 0x00017161
>>> 0x17161:	movs	r0, r0
>>> (gdb) x/i 0x00014949
>>> 0x14949 <crash_device+12>:	cbz	r0, 0x1496a <crash_device+46>
>>> (gdb) x/i 0x00014978
>>> 0x14978 <crash_device+60>:	sdiv	r3, r3, r2
>>> (gdb) x/i 0x00014a21
>>> 0x14a21 <crash_cli_cmd+12>:	cbz	r0, 0x14a28 <crash_cli_cmd+20>
>>> (gdb) x/i 0x00014a15
>>> 0x14a15 <crash_cli_cmd>:	push	{r3, lr}
>>> (gdb) list *0x14949
>>> 0x14949 is in crash_device (repos/apache-mynewt-core/test/crash_test/src/crash_test.c:42).
>>> warning: Source file is more recent than executable.
>>> 37	int
>>> 38	crash_device(char *how)
>>> 39	{
>>> 40	    volatile int val1, val2, val3;
>>> 41	
>>> 42	    if (!strcmp(how, "div0")) {
>>> 43	
>>> 44	        val1 = 42;
>>> 45	        val2 = 0;
>>> 46	
>>> (gdb) list *0x00014a21
>>> 0x14a21 is in crash_cli_cmd (repos/apache-mynewt-core/test/crash_test/src/crash_cli.c:41).
>>> 36	};
>>> 37	
>>> 38	static int
>>> 39	crash_cli_cmd(int argc, char **argv)
>>> 40	{
>>> 41	    if (argc >= 2 && crash_device(argv[1]) == 0) {
>>> 42	        return 0;
>>> 43	    }
>>> 44	    console_printf("Usage crash [div0|jump0|ref0|assert|wdog]\n");
>>> 45	    return 0;
>>> (gdb) list *0x14a21
>>> 0x14a21 is in crash_cli_cmd (repos/apache-mynewt-core/test/crash_test/src/crash_cli.c:41).
>>> 36	};
>>> 37	
>>> 38	static int
>>> 39	crash_cli_cmd(int argc, char **argv)
>>> 40	{
>>> 41	    if (argc >= 2 && crash_device(argv[1]) == 0) {
>>> 42	        return 0;
>>> 43	    }
>>> 44	    console_printf("Usage crash [div0|jump0|ref0|assert|wdog]\n");
>>> 45	    return 0;
>>> 
>>> good luck.
>>> 
>>>> On Aug 31, 2018, at 2:10 PM, Aditya Xavier <ad...@me.com.INVALID> wrote:
>>>> 
>>>> It seems OS_CRASH_STACKTRACE was introduced after 1.4.1 and hence not in the release.
>>>> 
>>>> If I change the release, I believe there would be many API changes to be done on MESH side.
>>>> 
>>>> Can you guide me on how to "manually walk the stack for looking for things which look like pointers to text” ?
>>>> 
>>>> My gdb skill are pretty weak.
>>>> 
>>>> I tried gdb where, with the following outcome.
>>>> 
>>>> (gdb) c
>>>> Continuing.
>>>> 
>>>> 
>>>> Program received signal SIGTRAP, Trace/breakpoint trap.
>>>> hal_system_reset () at repos/apache-mynewt-core/hw/mcu/nordic/nrf52xxx/src/hal_system.c:50
>>>> 50	            asm("bkpt");
>>>> (gdb) 
>>>> Continuing.
>>>> 
>>>> Program received signal SIGTRAP, Trace/breakpoint trap.
>>>> hal_system_reset () at repos/apache-mynewt-core/hw/mcu/nordic/nrf52xxx/src/hal_system.c:50
>>>> 50	            asm("bkpt");
>>>> (gdb) where
>>>> #0  hal_system_reset () at repos/apache-mynewt-core/hw/mcu/nordic/nrf52xxx/src/hal_system.c:50
>>>> #1  0x0000bf2e in os_default_irq (tf=0x2000ffc8) at repos/apache-mynewt-core/kernel/os/src/arch/cortex_m4/os_fault.c:170
>>>> #2  0x0000da56 in os_default_irq_asm () at repos/apache-mynewt-core/kernel/os/src/arch/cortex_m4/m4/HAL_CM4.s:260
>>>> #3  <signal handler called>
>>>> #4  0x00000000 in ?? ()
>>>> Backtrace stopped: previous frame identical to this frame (corrupt stack?)
>>>> 
>>>> 
>>>> 
>>>>> On 31-Aug-2018, at 4:30 PM, marko kiiskila <ma...@runtime.io> wrote:
>>>>> 
>>>>> 
>>>>> 
>>>>>> On Aug 31, 2018, at 1:47 PM, Aditya Xavier <ad...@me.com.INVALID> wrote:
>>>>>> 
>>>>>> Hi !
>>>>>> 
>>>>>> Am having an issue with Sending and Receiving a Mesh Message. Though am positive the problem is more towards releasing the semaphore.
>>>>>> 
>>>>>> Action Received over MESH Length :- 15
>>>>>> 012273 Unhandled interrupt (3), exception sp 0x2000abd0
>>>>>> 012273  r0:0xd7229882  r1:0xd929b3bb  r2:0xcf0f98cb  r3:0x5c5a76b3
>>>>>> 012273  r4:0x1b000000  r5:0x2000acc0  r6:0x2000aca0  r7:0x00000008
>>>>>> 012273  r8:0x00000000  r9:0x200008a9 r10:0x2000ad28 r11:0x00011d91
>>>>>> 012273 r12:0x681af5c8  lr:0xb1334673  pc:0x7e3cdeb8 psr:0x2266a80b
>>>>>> 012273 ICSR:0x00411803 HFSR:0x40000000 CFSR:0x00040000
>>>>>> 012273 BFAR:0xe000ed38 MMFAR:0xe000ed34
>>>>>> 
>>>>>> Am sending a group mesh message for testing. The sequence of events are as follows.
>>>>>> 
>>>>>> Button TASK -> send message over MESH -> Mesh receives message on model -> copies the data and starts releases the Semaphore for another task -> LOG Task starts and logs the message.
>>>>>> 
>>>>>> In this entire flow, the moment I receive the message and release the semaphore the firmware crashes.
>>>>>> 
>>>>>> I tried increasing the STACK size of the LOG task, however that didn’t help.
>>>>>> 
>>>>>> Could someone let me know how to understand where / why the crash is happening ?
>>>>> 
>>>>> Looking at your registers they seem to be garbage, so I’m guessing stack
>>>>> corruption of some sort; does not have to be overflow.
>>>>> Try turning on OS_CRASH_STACKTRACE, or manually walk the stack for looking for things which
>>>>> look like pointers to text.

Re: Mynewt crash when releasing semaphore

Posted by marko kiiskila <ma...@runtime.io>.

Some suggestions (inline).

> On Aug 31, 2018, at 2:32 PM, Aditya Xavier <ad...@me.com.INVALID> wrote:
> 
> Gosh, this doesn’t make much sense to me :(
> 
> (gdb) monitor go
> (gdb) monitor reset
> Resetting target
> (gdb) c
> Continuing.
> 
> Program received signal SIGTRAP, Trace/breakpoint trap.
> hal_system_reset () at repos/apache-mynewt-core/hw/mcu/nordic/nrf52xxx/src/hal_system.c:50
> 50	            asm("bkpt");
> (gdb) bt
> #0  hal_system_reset () at repos/apache-mynewt-core/hw/mcu/nordic/nrf52xxx/src/hal_system.c:50
> #1  0x0000bf2e in os_default_irq (tf=0x2000ffc8) at repos/apache-mynewt-core/kernel/os/src/arch/cortex_m4/os_fault.c:170
> #2  0x0000da56 in os_default_irq_asm () at repos/apache-mynewt-core/kernel/os/src/arch/cortex_m4/m4/HAL_CM4.s:260
> #3  <signal handler called>
> #4  0x00000000 in ?? ()
> Backtrace stopped: previous frame identical to this frame (corrupt stack?)
> (gdb) frame 1
> #1  0x0000bf2e in os_default_irq (tf=0x2000ffc8) at repos/apache-mynewt-core/kernel/os/src/arch/cortex_m4/os_fault.c:170
> 170	    hal_system_reset();
> (gdb) p/x *tf
> $1 = {ef = 0x2000abd0, r4 = 0x1b000000, r5 = 0x2000acc0, r6 = 0x2000aca0, r7 = 0x7, r8 = 0x0, r9 = 0x200008a9, r10 = 0x2000ad28, r11 = 0x11d91, lr = 0xfffffffd}
> (gdb) p/x *tf->ef
> $2 = {r0 = 0xd7229882, r1 = 0xd929b3bb, r2 = 0xcf0f98cb, r3 = 0x5c5a76b3, r12 = 0x681af5c8, lr = 0xb1334673, pc = 0x7e3cdeb8, psr = 0x2266a80b}
> (gdb) x/32x 0xd7229882
> 0xd7229882:	0x00000000	0x00000000	0x00000000	0x00000000
> 0xd7229892:	0x00000000	0x00000000	0x00000000	0x00000000
> 0xd72298a2:	0x00000000	0x00000000	0x00000000	0x00000000
> 0xd72298b2:	0x00000000	0x00000000	0x00000000	0x00000000
> 0xd72298c2:	0x00000000	0x00000000	0x00000000	0x00000000
> 0xd72298d2:	0x00000000	0x00000000	0x00000000	0x00000000
> 0xd72298e2:	0x00000000	0x00000000	0x00000000	0x00000000
> 0xd72298f2:	0x00000000	0x00000000	0x00000000	0x00000000
> (gdb) x/32x 0x2000abd0
> 0x2000abd0:	0xd7229882	0xd929b3bb	0xcf0f98cb	0x5c5a76b3
> 0x2000abe0:	0x681af5c8	0xb1334673	0x7e3cdeb8	0x2266a80b
> 0x2000abf0:	0x59d8de5b	0xe8eb9828	0x96d74690	0xb4b1ee9b
> 0x2000ac00:	0x95f0cad6	0x7d1b52fe	0xebcc146e	0x5f7dfaf5
> 0x2000ac10:	0x62dd2c19	0x1fc67ee7	0xf40a6a89	0xab77907c

^^^^^ looks bad, especially the top area. Should have dump of registers
stored at the time the crash.


> 0x2000ac20:	0x00000010	0x00039c74	0x2000ad28	0x0002329f
> 0x2000ac30:	0xd87c5730	0xa203a288	0x00000010	0x00039c74
> 0x2000ac40:	0x2000ad28	0x00023485	0x00000000	0x00000000
> (gdb) p &__text
> No symbol "__text" in current context.
> (gdb)  p &__etext
> $3 = (<data variable, no debug info> *) 0x3a9c8
> (gdb) p &__text
> No symbol "__text" in current context.

This was probably added at the same time as OS_STACK_BACKTRACE.
You’re looking for values between start of your image slot and 0x3a9c8.

> (gdb) x/i 0xd7229882
>   0xd7229882:	movs	r0, r0
> (gdb) list *0xd7229882
> (gdb) x/i 0x681af5c8
>   0x681af5c8:	movs	r0, r0
> (gdb) x/i 0x59d8de5b
>   0x59d8de5b:	movs	r0, r0
> (gdb) x/i 0x62dd2c19
>   0x62dd2c19:	movs	r0, r0
> (gdb) x/i 0x2000ad28
>   0x2000ad28:	lsls	r0, r2, #6
> (gdb) x/i 0x1fc67ee7
>   0x1fc67ee7:	movs	r0, r0
> (gdb) x/i 0xa203a288
>   0xa203a288:	movs	r0, r0
> (gdb) x/i 0xe8eb9828
>   0xe8eb9828:	movs	r0, r0
> (gdb) x/i 0xcf0f98cb
>   0xcf0f98cb:	movs	r0, r0
> (gdb) x/i 0x96d74690
>   0x96d74690:	movs	r0, r0
> (gdb) x/i 0xf40a6a89
>   0xf40a6a89:	movs	r0, r0
> (gdb) x/i 0x2000ad28
>   0x2000ad28:	lsls	r0, r2, #6
> (gdb) x/i 0x00000010
>   0x10:	movs	r0, r0
> (gdb) x/i 0x0002329f
>   0x2329f <shift_rows+108>:	add	sp, #20
> (gdb) x/i 0x00039c74
>   0x39c74 <sbox>:	ldrb	r3, [r4, #17]
> (gdb) x/i 0xa203a288
>   0xa203a288:	movs	r0, r0
> (gdb) x/i 0x0002329f
>   0x2329f <shift_rows+108>:	add	sp, #20
> (gdb) list *0x0002329f
> 0x2329f is in shift_rows (repos/apache-mynewt-core/crypto/tinycrypt/src/aes_encrypt.c:156).
> 151		t[0]  = s[0]; t[1] = s[5]; t[2] = s[10]; t[3] = s[15];
> 152		t[4]  = s[4]; t[5] = s[9]; t[6] = s[14]; t[7] = s[3];
> 153		t[8]  = s[8]; t[9] = s[13]; t[10] = s[2]; t[11] = s[7];
> 154		t[12] = s[12]; t[13] = s[1]; t[14] = s[6]; t[15] = s[11];
> 155		(void) _copy(s, sizeof(t), t, sizeof(t));
> 156	}
> 157	
> 158	int tc_aes_encrypt(uint8_t *out, const uint8_t *in, const TCAesKeySched_t s)
> 159	{
> 160		uint8_t state[Nk*Nb];

That could be writing that random looking data in the stack. encrypted data should
look like gibberish.
Follow the stack a bit further starting continuing from 0x2000ac50. See if you
find who called it. I’m hazarding a guess that one of those args passed to aes_encrypt()
is pointing to stack, and there’s not enough memory allocated to hold that data.


>> On 31-Aug-2018, at 4:46 PM, marko kiiskila <ma...@runtime.io> wrote:
>> 
>> Sure. Something like this:
>> 
>> 000933 compat> crash div0
>> crash div0
>> 003157 Unhandled interrupt (3), exception sp 0x20001dd8
>> 003157  r0:0x00000000  r1:0x00017161  r2:0x00000000  r3:0x0000002a
>> 003157  r4:0x200041d6  r5:0x00000000  r6:0x20000318  r7:0x00000000
>> 003157  r8:0x00000000  r9:0x00000000 r10:0x00000000 r11:0x00000000
>> 003157 r12:0x00000000  lr:0x00014949  pc:0x00014978 psr:0x61000000
>> 003157 ICSR:0x00421803 HFSR:0x40000000 CFSR:0x02000000
>> 003157 BFAR:0xe000ed38 MMFAR:0xe000ed34
>> 
>> Then from gdb:
>> 
>> Program received signal SIGTRAP, Trace/breakpoint trap.
>> hal_system_reset ()
>>   at repos/apache-mynewt-core/hw/mcu/nordic/nrf52xxx/src/hal_system.c:50
>> 50	            asm("bkpt");
>> (gdb) bt
>> #0  hal_system_reset ()
>>   at repos/apache-mynewt-core/hw/mcu/nordic/nrf52xxx/src/hal_system.c:50
>> #1  0x00008be8 in os_default_irq (tf=0x2000ffc0)
>>   at repos/apache-mynewt-core/kernel/os/src/arch/cortex_m4/os_fault.c:171
>> #2  0x0000a5b6 in os_default_irq_asm ()
>>   at repos/apache-mynewt-core/kernel/os/src/arch/cortex_m4/m4/HAL_CM4.s:260
>> #3  <signal handler called>
>> #4  0x00000000 in ?? ()
>> #5  0x0000812c in Reset_Handler ()
>>   at repos/apache-mynewt-core/hw/bsp/nrf52dk/src/arch/cortex_m4/gcc_startup_nrf52.s:180
>> Backtrace stopped: previous frame identical to this frame (corrupt stack?)
>> (gdb) frame 1
>> #1  0x00008be8 in os_default_irq (tf=0x2000ffc0)
>>   at repos/apache-mynewt-core/kernel/os/src/arch/cortex_m4/os_fault.c:171
>> 171	    hal_system_reset();
>> (gdb) p/x *tf
>> $1 = {ef = 0x20001dd8, r4 = 0x200041d6, r5 = 0x0, r6 = 0x20000318, r7 = 0x0, 
>> r8 = 0x0, r9 = 0x0, r10 = 0x0, r11 = 0x0, lr = 0xfffffffd}
>> (gdb) p/x *tf->ef
>> $2 = {r0 = 0x0, r1 = 0x17161, r2 = 0x0, r3 = 0x2a, r12 = 0x0, lr = 0x14949, 
>> pc = 0x14978, psr = 0x61000000}
>> (gdb) x/32x 0x20001dd8
>> 0x20001dd8 <os_main_stack+3896>:	0x00000000	0x00017161	0x00000000	0x0000002a
>> 0x20001de8 <os_main_stack+3912>:	0x00000000	0x00014949	0x00014978	0x61000000
>> 0x20001df8 <os_main_stack+3928>:	0x00000003	0x00000000	0x00000000	0x0000002a
>> 0x20001e08 <os_main_stack+3944>:	0x00000001	0x00000002	0x0000000a	0x00014a21
>> 0x20001e18 <os_main_stack+3960>:	0x00014a15	0x0000ebd9	0x00000000	0x200041d0
>> 0x20001e28 <os_main_stack+3976>:	0x200041d6	0x00000000	0x0000000a	0x0001574d
>> 0x20001e38 <os_main_stack+3992>:	0x00015741	0x0000c925	0x200041d0	0x00000011
>> 0x20001e48 <os_main_stack+4008>:	0x00000073	0x200041d3	0x00000000	0x0000ede9
>> (gdb) p &__text
>> $3 = (<data variable, no debug info> *) 0x8020 <__isr_vector>
>> (gdb) p &__etext
>> $4 = (<data variable, no debug info> *) 0x175f0
>> (gdb) x/i 0x00017161
>>  0x17161:	movs	r0, r0
>> (gdb) x/i 0x00014949
>>  0x14949 <crash_device+12>:	cbz	r0, 0x1496a <crash_device+46>
>> (gdb) x/i 0x00014978
>>  0x14978 <crash_device+60>:	sdiv	r3, r3, r2
>> (gdb) x/i 0x00014a21
>>  0x14a21 <crash_cli_cmd+12>:	cbz	r0, 0x14a28 <crash_cli_cmd+20>
>> (gdb) x/i 0x00014a15
>>  0x14a15 <crash_cli_cmd>:	push	{r3, lr}
>> (gdb) list *0x14949
>> 0x14949 is in crash_device (repos/apache-mynewt-core/test/crash_test/src/crash_test.c:42).
>> warning: Source file is more recent than executable.
>> 37	int
>> 38	crash_device(char *how)
>> 39	{
>> 40	    volatile int val1, val2, val3;
>> 41	
>> 42	    if (!strcmp(how, "div0")) {
>> 43	
>> 44	        val1 = 42;
>> 45	        val2 = 0;
>> 46	
>> (gdb) list *0x00014a21
>> 0x14a21 is in crash_cli_cmd (repos/apache-mynewt-core/test/crash_test/src/crash_cli.c:41).
>> 36	};
>> 37	
>> 38	static int
>> 39	crash_cli_cmd(int argc, char **argv)
>> 40	{
>> 41	    if (argc >= 2 && crash_device(argv[1]) == 0) {
>> 42	        return 0;
>> 43	    }
>> 44	    console_printf("Usage crash [div0|jump0|ref0|assert|wdog]\n");
>> 45	    return 0;
>> (gdb) list *0x14a21
>> 0x14a21 is in crash_cli_cmd (repos/apache-mynewt-core/test/crash_test/src/crash_cli.c:41).
>> 36	};
>> 37	
>> 38	static int
>> 39	crash_cli_cmd(int argc, char **argv)
>> 40	{
>> 41	    if (argc >= 2 && crash_device(argv[1]) == 0) {
>> 42	        return 0;
>> 43	    }
>> 44	    console_printf("Usage crash [div0|jump0|ref0|assert|wdog]\n");
>> 45	    return 0;
>> 
>> good luck.
>> 
>>> On Aug 31, 2018, at 2:10 PM, Aditya Xavier <ad...@me.com.INVALID> wrote:
>>> 
>>> It seems OS_CRASH_STACKTRACE was introduced after 1.4.1 and hence not in the release.
>>> 
>>> If I change the release, I believe there would be many API changes to be done on MESH side.
>>> 
>>> Can you guide me on how to "manually walk the stack for looking for things which look like pointers to text” ?
>>> 
>>> My gdb skill are pretty weak.
>>> 
>>> I tried gdb where, with the following outcome.
>>> 
>>> (gdb) c
>>> Continuing.
>>> 
>>> 
>>> Program received signal SIGTRAP, Trace/breakpoint trap.
>>> hal_system_reset () at repos/apache-mynewt-core/hw/mcu/nordic/nrf52xxx/src/hal_system.c:50
>>> 50	            asm("bkpt");
>>> (gdb) 
>>> Continuing.
>>> 
>>> Program received signal SIGTRAP, Trace/breakpoint trap.
>>> hal_system_reset () at repos/apache-mynewt-core/hw/mcu/nordic/nrf52xxx/src/hal_system.c:50
>>> 50	            asm("bkpt");
>>> (gdb) where
>>> #0  hal_system_reset () at repos/apache-mynewt-core/hw/mcu/nordic/nrf52xxx/src/hal_system.c:50
>>> #1  0x0000bf2e in os_default_irq (tf=0x2000ffc8) at repos/apache-mynewt-core/kernel/os/src/arch/cortex_m4/os_fault.c:170
>>> #2  0x0000da56 in os_default_irq_asm () at repos/apache-mynewt-core/kernel/os/src/arch/cortex_m4/m4/HAL_CM4.s:260
>>> #3  <signal handler called>
>>> #4  0x00000000 in ?? ()
>>> Backtrace stopped: previous frame identical to this frame (corrupt stack?)
>>> 
>>> 
>>> 
>>>> On 31-Aug-2018, at 4:30 PM, marko kiiskila <ma...@runtime.io> wrote:
>>>> 
>>>> 
>>>> 
>>>>> On Aug 31, 2018, at 1:47 PM, Aditya Xavier <ad...@me.com.INVALID> wrote:
>>>>> 
>>>>> Hi !
>>>>> 
>>>>> Am having an issue with Sending and Receiving a Mesh Message. Though am positive the problem is more towards releasing the semaphore.
>>>>> 
>>>>> Action Received over MESH Length :- 15
>>>>> 012273 Unhandled interrupt (3), exception sp 0x2000abd0
>>>>> 012273  r0:0xd7229882  r1:0xd929b3bb  r2:0xcf0f98cb  r3:0x5c5a76b3
>>>>> 012273  r4:0x1b000000  r5:0x2000acc0  r6:0x2000aca0  r7:0x00000008
>>>>> 012273  r8:0x00000000  r9:0x200008a9 r10:0x2000ad28 r11:0x00011d91
>>>>> 012273 r12:0x681af5c8  lr:0xb1334673  pc:0x7e3cdeb8 psr:0x2266a80b
>>>>> 012273 ICSR:0x00411803 HFSR:0x40000000 CFSR:0x00040000
>>>>> 012273 BFAR:0xe000ed38 MMFAR:0xe000ed34
>>>>> 
>>>>> Am sending a group mesh message for testing. The sequence of events are as follows.
>>>>> 
>>>>> Button TASK -> send message over MESH -> Mesh receives message on model -> copies the data and starts releases the Semaphore for another task -> LOG Task starts and logs the message.
>>>>> 
>>>>> In this entire flow, the moment I receive the message and release the semaphore the firmware crashes.
>>>>> 
>>>>> I tried increasing the STACK size of the LOG task, however that didn’t help.
>>>>> 
>>>>> Could someone let me know how to understand where / why the crash is happening ?
>>>> 
>>>> Looking at your registers they seem to be garbage, so I’m guessing stack
>>>> corruption of some sort; does not have to be overflow.
>>>> Try turning on OS_CRASH_STACKTRACE, or manually walk the stack for looking for things which
>>>> look like pointers to text.
>>>> 
>>>> 
>>> 
>> 
>

Re: Mynewt crash when releasing semaphore

Posted by Aditya Xavier <ad...@me.com.INVALID>.

Gosh, this doesn’t make much sense to me :(

(gdb) monitor go
(gdb) monitor reset
Resetting target
(gdb) c
Continuing.

Program received signal SIGTRAP, Trace/breakpoint trap.
hal_system_reset () at repos/apache-mynewt-core/hw/mcu/nordic/nrf52xxx/src/hal_system.c:50
50	            asm("bkpt");
(gdb) bt
#0  hal_system_reset () at repos/apache-mynewt-core/hw/mcu/nordic/nrf52xxx/src/hal_system.c:50
#1  0x0000bf2e in os_default_irq (tf=0x2000ffc8) at repos/apache-mynewt-core/kernel/os/src/arch/cortex_m4/os_fault.c:170
#2  0x0000da56 in os_default_irq_asm () at repos/apache-mynewt-core/kernel/os/src/arch/cortex_m4/m4/HAL_CM4.s:260
#3  <signal handler called>
#4  0x00000000 in ?? ()
Backtrace stopped: previous frame identical to this frame (corrupt stack?)
(gdb) frame 1
#1  0x0000bf2e in os_default_irq (tf=0x2000ffc8) at repos/apache-mynewt-core/kernel/os/src/arch/cortex_m4/os_fault.c:170
170	    hal_system_reset();
(gdb) p/x *tf
$1 = {ef = 0x2000abd0, r4 = 0x1b000000, r5 = 0x2000acc0, r6 = 0x2000aca0, r7 = 0x7, r8 = 0x0, r9 = 0x200008a9, r10 = 0x2000ad28, r11 = 0x11d91, lr = 0xfffffffd}
(gdb) p/x *tf->ef
$2 = {r0 = 0xd7229882, r1 = 0xd929b3bb, r2 = 0xcf0f98cb, r3 = 0x5c5a76b3, r12 = 0x681af5c8, lr = 0xb1334673, pc = 0x7e3cdeb8, psr = 0x2266a80b}
(gdb) x/32x 0xd7229882
0xd7229882:	0x00000000	0x00000000	0x00000000	0x00000000
0xd7229892:	0x00000000	0x00000000	0x00000000	0x00000000
0xd72298a2:	0x00000000	0x00000000	0x00000000	0x00000000
0xd72298b2:	0x00000000	0x00000000	0x00000000	0x00000000
0xd72298c2:	0x00000000	0x00000000	0x00000000	0x00000000
0xd72298d2:	0x00000000	0x00000000	0x00000000	0x00000000
0xd72298e2:	0x00000000	0x00000000	0x00000000	0x00000000
0xd72298f2:	0x00000000	0x00000000	0x00000000	0x00000000
(gdb) x/32x 0x2000abd0
0x2000abd0:	0xd7229882	0xd929b3bb	0xcf0f98cb	0x5c5a76b3
0x2000abe0:	0x681af5c8	0xb1334673	0x7e3cdeb8	0x2266a80b
0x2000abf0:	0x59d8de5b	0xe8eb9828	0x96d74690	0xb4b1ee9b
0x2000ac00:	0x95f0cad6	0x7d1b52fe	0xebcc146e	0x5f7dfaf5
0x2000ac10:	0x62dd2c19	0x1fc67ee7	0xf40a6a89	0xab77907c
0x2000ac20:	0x00000010	0x00039c74	0x2000ad28	0x0002329f
0x2000ac30:	0xd87c5730	0xa203a288	0x00000010	0x00039c74
0x2000ac40:	0x2000ad28	0x00023485	0x00000000	0x00000000
(gdb) p &__text
No symbol "__text" in current context.
(gdb)  p &__etext
$3 = (<data variable, no debug info> *) 0x3a9c8
(gdb) p &__text
No symbol "__text" in current context.
(gdb) x/i 0xd7229882
   0xd7229882:	movs	r0, r0
(gdb) list *0xd7229882
(gdb) x/i 0x681af5c8
   0x681af5c8:	movs	r0, r0
(gdb) x/i 0x59d8de5b
   0x59d8de5b:	movs	r0, r0
(gdb) x/i 0x62dd2c19
   0x62dd2c19:	movs	r0, r0
(gdb) x/i 0x2000ad28
   0x2000ad28:	lsls	r0, r2, #6
(gdb) x/i 0x1fc67ee7
   0x1fc67ee7:	movs	r0, r0
(gdb) x/i 0xa203a288
   0xa203a288:	movs	r0, r0
(gdb) x/i 0xe8eb9828
   0xe8eb9828:	movs	r0, r0
(gdb) x/i 0xcf0f98cb
   0xcf0f98cb:	movs	r0, r0
(gdb) x/i 0x96d74690
   0x96d74690:	movs	r0, r0
(gdb) x/i 0xf40a6a89
   0xf40a6a89:	movs	r0, r0
(gdb) x/i 0x2000ad28
   0x2000ad28:	lsls	r0, r2, #6
(gdb) x/i 0x00000010
   0x10:	movs	r0, r0
(gdb) x/i 0x0002329f
   0x2329f <shift_rows+108>:	add	sp, #20
(gdb) x/i 0x00039c74
   0x39c74 <sbox>:	ldrb	r3, [r4, #17]
(gdb) x/i 0xa203a288
   0xa203a288:	movs	r0, r0
(gdb) x/i 0x0002329f
   0x2329f <shift_rows+108>:	add	sp, #20
(gdb) list *0x0002329f
0x2329f is in shift_rows (repos/apache-mynewt-core/crypto/tinycrypt/src/aes_encrypt.c:156).
151		t[0]  = s[0]; t[1] = s[5]; t[2] = s[10]; t[3] = s[15];
152		t[4]  = s[4]; t[5] = s[9]; t[6] = s[14]; t[7] = s[3];
153		t[8]  = s[8]; t[9] = s[13]; t[10] = s[2]; t[11] = s[7];
154		t[12] = s[12]; t[13] = s[1]; t[14] = s[6]; t[15] = s[11];
155		(void) _copy(s, sizeof(t), t, sizeof(t));
156	}
157	
158	int tc_aes_encrypt(uint8_t *out, const uint8_t *in, const TCAesKeySched_t s)
159	{
160		uint8_t state[Nk*Nb];

> On 31-Aug-2018, at 4:46 PM, marko kiiskila <ma...@runtime.io> wrote:
> 
> Sure. Something like this:
> 
> 000933 compat> crash div0
> crash div0
> 003157 Unhandled interrupt (3), exception sp 0x20001dd8
> 003157  r0:0x00000000  r1:0x00017161  r2:0x00000000  r3:0x0000002a
> 003157  r4:0x200041d6  r5:0x00000000  r6:0x20000318  r7:0x00000000
> 003157  r8:0x00000000  r9:0x00000000 r10:0x00000000 r11:0x00000000
> 003157 r12:0x00000000  lr:0x00014949  pc:0x00014978 psr:0x61000000
> 003157 ICSR:0x00421803 HFSR:0x40000000 CFSR:0x02000000
> 003157 BFAR:0xe000ed38 MMFAR:0xe000ed34
> 
> Then from gdb:
> 
> Program received signal SIGTRAP, Trace/breakpoint trap.
> hal_system_reset ()
>    at repos/apache-mynewt-core/hw/mcu/nordic/nrf52xxx/src/hal_system.c:50
> 50	            asm("bkpt");
> (gdb) bt
> #0  hal_system_reset ()
>    at repos/apache-mynewt-core/hw/mcu/nordic/nrf52xxx/src/hal_system.c:50
> #1  0x00008be8 in os_default_irq (tf=0x2000ffc0)
>    at repos/apache-mynewt-core/kernel/os/src/arch/cortex_m4/os_fault.c:171
> #2  0x0000a5b6 in os_default_irq_asm ()
>    at repos/apache-mynewt-core/kernel/os/src/arch/cortex_m4/m4/HAL_CM4.s:260
> #3  <signal handler called>
> #4  0x00000000 in ?? ()
> #5  0x0000812c in Reset_Handler ()
>    at repos/apache-mynewt-core/hw/bsp/nrf52dk/src/arch/cortex_m4/gcc_startup_nrf52.s:180
> Backtrace stopped: previous frame identical to this frame (corrupt stack?)
> (gdb) frame 1
> #1  0x00008be8 in os_default_irq (tf=0x2000ffc0)
>    at repos/apache-mynewt-core/kernel/os/src/arch/cortex_m4/os_fault.c:171
> 171	    hal_system_reset();
> (gdb) p/x *tf
> $1 = {ef = 0x20001dd8, r4 = 0x200041d6, r5 = 0x0, r6 = 0x20000318, r7 = 0x0, 
>  r8 = 0x0, r9 = 0x0, r10 = 0x0, r11 = 0x0, lr = 0xfffffffd}
> (gdb) p/x *tf->ef
> $2 = {r0 = 0x0, r1 = 0x17161, r2 = 0x0, r3 = 0x2a, r12 = 0x0, lr = 0x14949, 
>  pc = 0x14978, psr = 0x61000000}
> (gdb) x/32x 0x20001dd8
> 0x20001dd8 <os_main_stack+3896>:	0x00000000	0x00017161	0x00000000	0x0000002a
> 0x20001de8 <os_main_stack+3912>:	0x00000000	0x00014949	0x00014978	0x61000000
> 0x20001df8 <os_main_stack+3928>:	0x00000003	0x00000000	0x00000000	0x0000002a
> 0x20001e08 <os_main_stack+3944>:	0x00000001	0x00000002	0x0000000a	0x00014a21
> 0x20001e18 <os_main_stack+3960>:	0x00014a15	0x0000ebd9	0x00000000	0x200041d0
> 0x20001e28 <os_main_stack+3976>:	0x200041d6	0x00000000	0x0000000a	0x0001574d
> 0x20001e38 <os_main_stack+3992>:	0x00015741	0x0000c925	0x200041d0	0x00000011
> 0x20001e48 <os_main_stack+4008>:	0x00000073	0x200041d3	0x00000000	0x0000ede9
> (gdb) p &__text
> $3 = (<data variable, no debug info> *) 0x8020 <__isr_vector>
> (gdb) p &__etext
> $4 = (<data variable, no debug info> *) 0x175f0
> (gdb) x/i 0x00017161
>   0x17161:	movs	r0, r0
> (gdb) x/i 0x00014949
>   0x14949 <crash_device+12>:	cbz	r0, 0x1496a <crash_device+46>
> (gdb) x/i 0x00014978
>   0x14978 <crash_device+60>:	sdiv	r3, r3, r2
> (gdb) x/i 0x00014a21
>   0x14a21 <crash_cli_cmd+12>:	cbz	r0, 0x14a28 <crash_cli_cmd+20>
> (gdb) x/i 0x00014a15
>   0x14a15 <crash_cli_cmd>:	push	{r3, lr}
> (gdb) list *0x14949
> 0x14949 is in crash_device (repos/apache-mynewt-core/test/crash_test/src/crash_test.c:42).
> warning: Source file is more recent than executable.
> 37	int
> 38	crash_device(char *how)
> 39	{
> 40	    volatile int val1, val2, val3;
> 41	
> 42	    if (!strcmp(how, "div0")) {
> 43	
> 44	        val1 = 42;
> 45	        val2 = 0;
> 46	
> (gdb) list *0x00014a21
> 0x14a21 is in crash_cli_cmd (repos/apache-mynewt-core/test/crash_test/src/crash_cli.c:41).
> 36	};
> 37	
> 38	static int
> 39	crash_cli_cmd(int argc, char **argv)
> 40	{
> 41	    if (argc >= 2 && crash_device(argv[1]) == 0) {
> 42	        return 0;
> 43	    }
> 44	    console_printf("Usage crash [div0|jump0|ref0|assert|wdog]\n");
> 45	    return 0;
> (gdb) list *0x14a21
> 0x14a21 is in crash_cli_cmd (repos/apache-mynewt-core/test/crash_test/src/crash_cli.c:41).
> 36	};
> 37	
> 38	static int
> 39	crash_cli_cmd(int argc, char **argv)
> 40	{
> 41	    if (argc >= 2 && crash_device(argv[1]) == 0) {
> 42	        return 0;
> 43	    }
> 44	    console_printf("Usage crash [div0|jump0|ref0|assert|wdog]\n");
> 45	    return 0;
> 
> good luck.
> 
>> On Aug 31, 2018, at 2:10 PM, Aditya Xavier <ad...@me.com.INVALID> wrote:
>> 
>> It seems OS_CRASH_STACKTRACE was introduced after 1.4.1 and hence not in the release.
>> 
>> If I change the release, I believe there would be many API changes to be done on MESH side.
>> 
>> Can you guide me on how to "manually walk the stack for looking for things which look like pointers to text” ?
>> 
>> My gdb skill are pretty weak.
>> 
>> I tried gdb where, with the following outcome.
>> 
>> (gdb) c
>> Continuing.
>> 
>> 
>> Program received signal SIGTRAP, Trace/breakpoint trap.
>> hal_system_reset () at repos/apache-mynewt-core/hw/mcu/nordic/nrf52xxx/src/hal_system.c:50
>> 50	            asm("bkpt");
>> (gdb) 
>> Continuing.
>> 
>> Program received signal SIGTRAP, Trace/breakpoint trap.
>> hal_system_reset () at repos/apache-mynewt-core/hw/mcu/nordic/nrf52xxx/src/hal_system.c:50
>> 50	            asm("bkpt");
>> (gdb) where
>> #0  hal_system_reset () at repos/apache-mynewt-core/hw/mcu/nordic/nrf52xxx/src/hal_system.c:50
>> #1  0x0000bf2e in os_default_irq (tf=0x2000ffc8) at repos/apache-mynewt-core/kernel/os/src/arch/cortex_m4/os_fault.c:170
>> #2  0x0000da56 in os_default_irq_asm () at repos/apache-mynewt-core/kernel/os/src/arch/cortex_m4/m4/HAL_CM4.s:260
>> #3  <signal handler called>
>> #4  0x00000000 in ?? ()
>> Backtrace stopped: previous frame identical to this frame (corrupt stack?)
>> 
>> 
>> 
>>> On 31-Aug-2018, at 4:30 PM, marko kiiskila <ma...@runtime.io> wrote:
>>> 
>>> 
>>> 
>>>> On Aug 31, 2018, at 1:47 PM, Aditya Xavier <ad...@me.com.INVALID> wrote:
>>>> 
>>>> Hi !
>>>> 
>>>> Am having an issue with Sending and Receiving a Mesh Message. Though am positive the problem is more towards releasing the semaphore.
>>>> 
>>>> Action Received over MESH Length :- 15
>>>> 012273 Unhandled interrupt (3), exception sp 0x2000abd0
>>>> 012273  r0:0xd7229882  r1:0xd929b3bb  r2:0xcf0f98cb  r3:0x5c5a76b3
>>>> 012273  r4:0x1b000000  r5:0x2000acc0  r6:0x2000aca0  r7:0x00000008
>>>> 012273  r8:0x00000000  r9:0x200008a9 r10:0x2000ad28 r11:0x00011d91
>>>> 012273 r12:0x681af5c8  lr:0xb1334673  pc:0x7e3cdeb8 psr:0x2266a80b
>>>> 012273 ICSR:0x00411803 HFSR:0x40000000 CFSR:0x00040000
>>>> 012273 BFAR:0xe000ed38 MMFAR:0xe000ed34
>>>> 
>>>> Am sending a group mesh message for testing. The sequence of events are as follows.
>>>> 
>>>> Button TASK -> send message over MESH -> Mesh receives message on model -> copies the data and starts releases the Semaphore for another task -> LOG Task starts and logs the message.
>>>> 
>>>> In this entire flow, the moment I receive the message and release the semaphore the firmware crashes.
>>>> 
>>>> I tried increasing the STACK size of the LOG task, however that didn’t help.
>>>> 
>>>> Could someone let me know how to understand where / why the crash is happening ?
>>> 
>>> Looking at your registers they seem to be garbage, so I’m guessing stack
>>> corruption of some sort; does not have to be overflow.
>>> Try turning on OS_CRASH_STACKTRACE, or manually walk the stack for looking for things which
>>> look like pointers to text.
>>> 
>>> 
>> 
>

Re: Mynewt crash when releasing semaphore

Posted by marko kiiskila <ma...@runtime.io>.

Sure. Something like this:

000933 compat> crash div0
crash div0
003157 Unhandled interrupt (3), exception sp 0x20001dd8
003157  r0:0x00000000  r1:0x00017161  r2:0x00000000  r3:0x0000002a
003157  r4:0x200041d6  r5:0x00000000  r6:0x20000318  r7:0x00000000
003157  r8:0x00000000  r9:0x00000000 r10:0x00000000 r11:0x00000000
003157 r12:0x00000000  lr:0x00014949  pc:0x00014978 psr:0x61000000
003157 ICSR:0x00421803 HFSR:0x40000000 CFSR:0x02000000
003157 BFAR:0xe000ed38 MMFAR:0xe000ed34

Then from gdb:

Program received signal SIGTRAP, Trace/breakpoint trap.
hal_system_reset ()
    at repos/apache-mynewt-core/hw/mcu/nordic/nrf52xxx/src/hal_system.c:50
50	            asm("bkpt");
(gdb) bt
#0  hal_system_reset ()
    at repos/apache-mynewt-core/hw/mcu/nordic/nrf52xxx/src/hal_system.c:50
#1  0x00008be8 in os_default_irq (tf=0x2000ffc0)
    at repos/apache-mynewt-core/kernel/os/src/arch/cortex_m4/os_fault.c:171
#2  0x0000a5b6 in os_default_irq_asm ()
    at repos/apache-mynewt-core/kernel/os/src/arch/cortex_m4/m4/HAL_CM4.s:260
#3  <signal handler called>
#4  0x00000000 in ?? ()
#5  0x0000812c in Reset_Handler ()
    at repos/apache-mynewt-core/hw/bsp/nrf52dk/src/arch/cortex_m4/gcc_startup_nrf52.s:180
Backtrace stopped: previous frame identical to this frame (corrupt stack?)
(gdb) frame 1
#1  0x00008be8 in os_default_irq (tf=0x2000ffc0)
    at repos/apache-mynewt-core/kernel/os/src/arch/cortex_m4/os_fault.c:171
171	    hal_system_reset();
(gdb) p/x *tf
$1 = {ef = 0x20001dd8, r4 = 0x200041d6, r5 = 0x0, r6 = 0x20000318, r7 = 0x0, 
  r8 = 0x0, r9 = 0x0, r10 = 0x0, r11 = 0x0, lr = 0xfffffffd}
(gdb) p/x *tf->ef
$2 = {r0 = 0x0, r1 = 0x17161, r2 = 0x0, r3 = 0x2a, r12 = 0x0, lr = 0x14949, 
  pc = 0x14978, psr = 0x61000000}
(gdb) x/32x 0x20001dd8
0x20001dd8 <os_main_stack+3896>:	0x00000000	0x00017161	0x00000000	0x0000002a
0x20001de8 <os_main_stack+3912>:	0x00000000	0x00014949	0x00014978	0x61000000
0x20001df8 <os_main_stack+3928>:	0x00000003	0x00000000	0x00000000	0x0000002a
0x20001e08 <os_main_stack+3944>:	0x00000001	0x00000002	0x0000000a	0x00014a21
0x20001e18 <os_main_stack+3960>:	0x00014a15	0x0000ebd9	0x00000000	0x200041d0
0x20001e28 <os_main_stack+3976>:	0x200041d6	0x00000000	0x0000000a	0x0001574d
0x20001e38 <os_main_stack+3992>:	0x00015741	0x0000c925	0x200041d0	0x00000011
0x20001e48 <os_main_stack+4008>:	0x00000073	0x200041d3	0x00000000	0x0000ede9
(gdb) p &__text
$3 = (<data variable, no debug info> *) 0x8020 <__isr_vector>
(gdb) p &__etext
$4 = (<data variable, no debug info> *) 0x175f0
(gdb) x/i 0x00017161
   0x17161:	movs	r0, r0
(gdb) x/i 0x00014949
   0x14949 <crash_device+12>:	cbz	r0, 0x1496a <crash_device+46>
(gdb) x/i 0x00014978
   0x14978 <crash_device+60>:	sdiv	r3, r3, r2
(gdb) x/i 0x00014a21
   0x14a21 <crash_cli_cmd+12>:	cbz	r0, 0x14a28 <crash_cli_cmd+20>
(gdb) x/i 0x00014a15
   0x14a15 <crash_cli_cmd>:	push	{r3, lr}
(gdb) list *0x14949
0x14949 is in crash_device (repos/apache-mynewt-core/test/crash_test/src/crash_test.c:42).
warning: Source file is more recent than executable.
37	int
38	crash_device(char *how)
39	{
40	    volatile int val1, val2, val3;
41	
42	    if (!strcmp(how, "div0")) {
43	
44	        val1 = 42;
45	        val2 = 0;
46	
(gdb) list *0x00014a21
0x14a21 is in crash_cli_cmd (repos/apache-mynewt-core/test/crash_test/src/crash_cli.c:41).
36	};
37	
38	static int
39	crash_cli_cmd(int argc, char **argv)
40	{
41	    if (argc >= 2 && crash_device(argv[1]) == 0) {
42	        return 0;
43	    }
44	    console_printf("Usage crash [div0|jump0|ref0|assert|wdog]\n");
45	    return 0;
(gdb) list *0x14a21
0x14a21 is in crash_cli_cmd (repos/apache-mynewt-core/test/crash_test/src/crash_cli.c:41).
36	};
37	
38	static int
39	crash_cli_cmd(int argc, char **argv)
40	{
41	    if (argc >= 2 && crash_device(argv[1]) == 0) {
42	        return 0;
43	    }
44	    console_printf("Usage crash [div0|jump0|ref0|assert|wdog]\n");
45	    return 0;

good luck.

> On Aug 31, 2018, at 2:10 PM, Aditya Xavier <ad...@me.com.INVALID> wrote:
> 
> It seems OS_CRASH_STACKTRACE was introduced after 1.4.1 and hence not in the release.
> 
> If I change the release, I believe there would be many API changes to be done on MESH side.
> 
> Can you guide me on how to "manually walk the stack for looking for things which look like pointers to text” ?
> 
> My gdb skill are pretty weak.
> 
> I tried gdb where, with the following outcome.
> 
> (gdb) c
> Continuing.
> 
> 
> Program received signal SIGTRAP, Trace/breakpoint trap.
> hal_system_reset () at repos/apache-mynewt-core/hw/mcu/nordic/nrf52xxx/src/hal_system.c:50
> 50	            asm("bkpt");
> (gdb) 
> Continuing.
> 
> Program received signal SIGTRAP, Trace/breakpoint trap.
> hal_system_reset () at repos/apache-mynewt-core/hw/mcu/nordic/nrf52xxx/src/hal_system.c:50
> 50	            asm("bkpt");
> (gdb) where
> #0  hal_system_reset () at repos/apache-mynewt-core/hw/mcu/nordic/nrf52xxx/src/hal_system.c:50
> #1  0x0000bf2e in os_default_irq (tf=0x2000ffc8) at repos/apache-mynewt-core/kernel/os/src/arch/cortex_m4/os_fault.c:170
> #2  0x0000da56 in os_default_irq_asm () at repos/apache-mynewt-core/kernel/os/src/arch/cortex_m4/m4/HAL_CM4.s:260
> #3  <signal handler called>
> #4  0x00000000 in ?? ()
> Backtrace stopped: previous frame identical to this frame (corrupt stack?)
> 
> 
> 
>> On 31-Aug-2018, at 4:30 PM, marko kiiskila <ma...@runtime.io> wrote:
>> 
>> 
>> 
>>> On Aug 31, 2018, at 1:47 PM, Aditya Xavier <ad...@me.com.INVALID> wrote:
>>> 
>>> Hi !
>>> 
>>> Am having an issue with Sending and Receiving a Mesh Message. Though am positive the problem is more towards releasing the semaphore.
>>> 
>>> Action Received over MESH Length :- 15
>>> 012273 Unhandled interrupt (3), exception sp 0x2000abd0
>>> 012273  r0:0xd7229882  r1:0xd929b3bb  r2:0xcf0f98cb  r3:0x5c5a76b3
>>> 012273  r4:0x1b000000  r5:0x2000acc0  r6:0x2000aca0  r7:0x00000008
>>> 012273  r8:0x00000000  r9:0x200008a9 r10:0x2000ad28 r11:0x00011d91
>>> 012273 r12:0x681af5c8  lr:0xb1334673  pc:0x7e3cdeb8 psr:0x2266a80b
>>> 012273 ICSR:0x00411803 HFSR:0x40000000 CFSR:0x00040000
>>> 012273 BFAR:0xe000ed38 MMFAR:0xe000ed34
>>> 
>>> Am sending a group mesh message for testing. The sequence of events are as follows.
>>> 
>>> Button TASK -> send message over MESH -> Mesh receives message on model -> copies the data and starts releases the Semaphore for another task -> LOG Task starts and logs the message.
>>> 
>>> In this entire flow, the moment I receive the message and release the semaphore the firmware crashes.
>>> 
>>> I tried increasing the STACK size of the LOG task, however that didn’t help.
>>> 
>>> Could someone let me know how to understand where / why the crash is happening ?
>> 
>> Looking at your registers they seem to be garbage, so I’m guessing stack
>> corruption of some sort; does not have to be overflow.
>> Try turning on OS_CRASH_STACKTRACE, or manually walk the stack for looking for things which
>> look like pointers to text.
>> 
>> 
>

Re: Mynewt crash when releasing semaphore

Posted by Aditya Xavier <ad...@me.com.INVALID>.

It seems OS_CRASH_STACKTRACE was introduced after 1.4.1 and hence not in the release.

If I change the release, I believe there would be many API changes to be done on MESH side.

Can you guide me on how to "manually walk the stack for looking for things which look like pointers to text” ?

My gdb skill are pretty weak.

I tried gdb where, with the following outcome.

(gdb) c
Continuing.

Program received signal SIGTRAP, Trace/breakpoint trap.
hal_system_reset () at repos/apache-mynewt-core/hw/mcu/nordic/nrf52xxx/src/hal_system.c:50
50	            asm("bkpt");
(gdb) 
Continuing.

Program received signal SIGTRAP, Trace/breakpoint trap.
hal_system_reset () at repos/apache-mynewt-core/hw/mcu/nordic/nrf52xxx/src/hal_system.c:50
50	            asm("bkpt");
(gdb) where
#0  hal_system_reset () at repos/apache-mynewt-core/hw/mcu/nordic/nrf52xxx/src/hal_system.c:50
#1  0x0000bf2e in os_default_irq (tf=0x2000ffc8) at repos/apache-mynewt-core/kernel/os/src/arch/cortex_m4/os_fault.c:170
#2  0x0000da56 in os_default_irq_asm () at repos/apache-mynewt-core/kernel/os/src/arch/cortex_m4/m4/HAL_CM4.s:260
#3  <signal handler called>
#4  0x00000000 in ?? ()
Backtrace stopped: previous frame identical to this frame (corrupt stack?)

> On 31-Aug-2018, at 4:30 PM, marko kiiskila <ma...@runtime.io> wrote:
> 
> 
> 
>> On Aug 31, 2018, at 1:47 PM, Aditya Xavier <ad...@me.com.INVALID> wrote:
>> 
>> Hi !
>> 
>> Am having an issue with Sending and Receiving a Mesh Message. Though am positive the problem is more towards releasing the semaphore.
>> 
>> Action Received over MESH Length :- 15
>> 012273 Unhandled interrupt (3), exception sp 0x2000abd0
>> 012273  r0:0xd7229882  r1:0xd929b3bb  r2:0xcf0f98cb  r3:0x5c5a76b3
>> 012273  r4:0x1b000000  r5:0x2000acc0  r6:0x2000aca0  r7:0x00000008
>> 012273  r8:0x00000000  r9:0x200008a9 r10:0x2000ad28 r11:0x00011d91
>> 012273 r12:0x681af5c8  lr:0xb1334673  pc:0x7e3cdeb8 psr:0x2266a80b
>> 012273 ICSR:0x00411803 HFSR:0x40000000 CFSR:0x00040000
>> 012273 BFAR:0xe000ed38 MMFAR:0xe000ed34
>> 
>> Am sending a group mesh message for testing. The sequence of events are as follows.
>> 
>> Button TASK -> send message over MESH -> Mesh receives message on model -> copies the data and starts releases the Semaphore for another task -> LOG Task starts and logs the message.
>> 
>> In this entire flow, the moment I receive the message and release the semaphore the firmware crashes.
>> 
>> I tried increasing the STACK size of the LOG task, however that didn’t help.
>> 
>> Could someone let me know how to understand where / why the crash is happening ?
> 
> Looking at your registers they seem to be garbage, so I’m guessing stack
> corruption of some sort; does not have to be overflow.
> Try turning on OS_CRASH_STACKTRACE, or manually walk the stack for looking for things which
> look like pointers to text.
> 
>

Re: Mynewt crash when releasing semaphore

Posted by marko kiiskila <ma...@runtime.io>.


> On Aug 31, 2018, at 1:47 PM, Aditya Xavier <ad...@me.com.INVALID> wrote:
> 
> Hi !
> 
> Am having an issue with Sending and Receiving a Mesh Message. Though am positive the problem is more towards releasing the semaphore.
> 
> Action Received over MESH Length :- 15
> 012273 Unhandled interrupt (3), exception sp 0x2000abd0
> 012273  r0:0xd7229882  r1:0xd929b3bb  r2:0xcf0f98cb  r3:0x5c5a76b3
> 012273  r4:0x1b000000  r5:0x2000acc0  r6:0x2000aca0  r7:0x00000008
> 012273  r8:0x00000000  r9:0x200008a9 r10:0x2000ad28 r11:0x00011d91
> 012273 r12:0x681af5c8  lr:0xb1334673  pc:0x7e3cdeb8 psr:0x2266a80b
> 012273 ICSR:0x00411803 HFSR:0x40000000 CFSR:0x00040000
> 012273 BFAR:0xe000ed38 MMFAR:0xe000ed34
> 
> Am sending a group mesh message for testing. The sequence of events are as follows.
> 
> Button TASK -> send message over MESH -> Mesh receives message on model -> copies the data and starts releases the Semaphore for another task -> LOG Task starts and logs the message.
> 
> In this entire flow, the moment I receive the message and release the semaphore the firmware crashes.
> 
> I tried increasing the STACK size of the LOG task, however that didn’t help.
> 
> Could someone let me know how to understand where / why the crash is happening ?

Looking at your registers they seem to be garbage, so I’m guessing stack
corruption of some sort; does not have to be overflow.
Try turning on OS_CRASH_STACKTRACE, or manually walk the stack for looking for things which
look like pointers to text.