You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@nuttx.apache.org by Nathan Hartman <ha...@gmail.com> on 2022/09/05 20:24:06 UTC

Crash in ostest prioinherit?

Has anyone ended up in __stack_chk_fail() from nxsig_nanosleep()?

I am running ostest on tiva and consistently getting a panic in the
"priority_inheritance: Restoration Test" (priority_inheritance() in
apps/testing/ostest/prioinherit.c).

It always happens in the same place:

The above-mentioned function creates 3 tasks called Task0, Task1,
Task2; grep for NUMBER_OF_COMPETING_THREADS to find the place where
these three tasks are started.

The task main function is adversary() in the same file; adversary()
looks like this:

static int adversary(int argc, FAR char *argv[])
{
  int index        = atoi(argv[1]);
  int inital_delay = atoi(argv[2]);
  int hold_delay   = atoi(argv[3]);

  sleep_and_display(index, inital_delay);
  printf("priority_inheritance: "
         "%s Started, waiting %d uS to take count\n", argv[0], inital_delay);
  sem_wait(&g_sem);
  sleep_and_display(index,  hold_delay);
  sem_post(&g_sem);
  printf("priority_inheritance: %s Posted\n", argv[0]);
  sleep_and_display(index, 0);
  return 0;
}

It runs sleep_and_display(), which calls usleep(), which eventually
calls nxsig_nanosleep(). When nxsig_nanosleep() finishes and wants to
return, the stack smashing protection kicks in and we end up in
__stack_chk_fail(). I think this is happening in Task2() or is in some
way connected to context switching between Task2() and another task
due to the sleep.

So far I can't seem to figure out why.

Thanks,
Nathan

Re: Crash in ostest prioinherit?

Posted by Nathan Hartman <ha...@gmail.com>.
On Wed, Sep 7, 2022 at 11:11 AM Nathan Hartman <ha...@gmail.com> wrote:
>
> On Tue, Sep 6, 2022 at 5:36 AM Fotis Panagiotopoulos
> <f....@gmail.com> wrote:
> >
> > Hello,
> >
> > Priority inheritance has a known bug, and it is not working correctly.
> > See issue #6310: https://github.com/apache/incubator-nuttx/issues/6310
> >
> > I had to disable it in our application, as it causes lots of problems.
> >
> > As I see, there are a couple of propositions on fixing this, but none of
> > them is merged yet.
> >
> > I would like to also express interest in this getting fixed. It is
> > important for our application.
> > Maybe the fix should be included in the upcoming release?

Hi all,

Can we get some more eyes to look at PR-6318 [1] please? This PR aims
to fix the issues with priority inheritance.

If possible, try running ostest on real hardware and report what is
happening with the priority inheritance tests?

I have been trying to hunt down a pesky hardfault as described earlier
in this thread. Unfortunately all the boards I have available are
armv7m so it is not possible to know if other archs will give
different/better/worse results.

[1] https://github.com/apache/incubator-nuttx/pull/6318

Thanks,
Nathan

Re: Crash in ostest prioinherit?

Posted by Nathan Hartman <ha...@gmail.com>.
On Tue, Sep 6, 2022 at 5:36 AM Fotis Panagiotopoulos
<f....@gmail.com> wrote:
>
> Hello,
>
> Priority inheritance has a known bug, and it is not working correctly.
> See issue #6310: https://github.com/apache/incubator-nuttx/issues/6310
>
> I had to disable it in our application, as it causes lots of problems.
>
> As I see, there are a couple of propositions on fixing this, but none of
> them is merged yet.
>
> I would like to also express interest in this getting fixed. It is
> important for our application.
> Maybe the fix should be included in the upcoming release?


Indeed.

The other day, I saw the failures on a custom board which does not have any
UART exposed. (Just FYI it's using a TM4C129ENCZAD which is a BGA package, so
I can't even hack some kind of UART connection to it.) It has only a network
connection and I was accessing NSH by telnet. I'm explaining this because that
made it impossible to see various debug output that was logged, since the OS
crash means that the network stack was not working anymore.

So yesterday I added board support for the EK-TM4C129EXL board from TI, which
uses a very close relative of the same family, TM4C129ENCPDT. This is
basically the same chip but in LQFP package. This has a UART through the ICDI
interface so it is possible to see more debug output.

I upstreamed this board support here:

https://github.com/apache/incubator-nuttx/pull/7023

It includes a config called 'ostest':

$ tools/configure.sh tm4c129e-launchpad:ostest

That config does not include priority inheritance and ostest runs
successfully. (Though it obviously does not run the priority inheritance test,
where the crash is.)

If you turn on priority inheritance ('make menuconfig' and turn on
CONFIG_PRIORITY_INHERITANCE and make no other changes) then the crash occurs:

[[[

...
lowpri_thread-3: Okay... I'm done!
lowpri_thread-2: Okay... I'm done!
               g_highstate[0]: 3
               g_highstate[1]: 3
               g_highstate[2]: 3
lowpri_thread-1: SUCCESS priority before sem_post: 1
lowpri_thread-1: SUCCESS final priority: 1
lowpri_thread-1: Okay... I'm done!
priority_inheritance: Waiting for lowpri_thread-2 to complete
priority_inheritance: Waiting for lowpri_thread-3 to complete
priority_inheritance: Restoration Test:
priority_inher^A\205\271\215\225Q\205\315\255\245\377BF
irq_unexpected_isr: ERROR irq: 3
up_assert: Assertion failed at file:irq/irq_unexpectedisr.c line: 54
task: Idle Task
arm_registerdump: R0: 200092c0 R1: 0000001c R2: 00000000  R3: deadbeef
arm_registerdump: R4: 00000000 R5: 55000000 R6: 00000000  FP: 00000000
arm_registerdump: R8: 00000000 SB: 00000000 SL: 00000000 R11: 2001153c
arm_registerdump: IP: 000070e0 SP: 20011350 LR: 0000b8e9  PC: 0000b8a2
arm_registerdump: xPSR: a1000000 PRIMASK: 00000001 CONTROL: 00000000
arm_registerdump: EXC_RETURN: fffffff9
arm_dump_stack: User Stack:
arm_dump_stack: sp:     20011210
arm_dump_stack:   base: 20008ed4
arm_dump_stack:   size: 000003e8
arm_dump_stack: ERROR: User Stack pointer is not within the stack
arm_showtasks:    PID    PRI     STACK      USED   FILLED    COMMAND

]]]

The bug(s) you mentioned in connection with priority inheritance have to do
with restoring the original priority of a task after it has been boosted. So
if I understand correctly, the impact of that bug is that a low priority task
will end up keeping a higher priority than it deserves. However, that bug
shouldn't (in my opinion) lead to a crash (even if it does cause other bad
things to happen, like important tasks not running).

I wonder if the crash I'm seeing is another bug, which could be related to
priority inheritance or not. My custom board was having intermittent crashes,
which is what caused me to go down this rabbit hole of the ostest in the first
place.

I'd like to get to the bottom of those spurious crashes first. I'll try
running that board without priority inheritance for a while and report back...

If anyone has input, please speak up!

Cheers,
Nathan

Re: Crash in ostest prioinherit?

Posted by Nathan Hartman <ha...@gmail.com>.
On Tue, Sep 6, 2022 at 5:36 AM Fotis Panagiotopoulos
<f....@gmail.com> wrote:
> Priority inheritance has a known bug, and it is not working correctly.
> See issue #6310: https://github.com/apache/incubator-nuttx/issues/6310
>
> I had to disable it in our application, as it causes lots of problems.

Thanks for pointing that out. I will retest with priority inheritance
disabled and report my findings...

Nathan

Re: Crash in ostest prioinherit?

Posted by Fotis Panagiotopoulos <f....@gmail.com>.
Hello,

Priority inheritance has a known bug, and it is not working correctly.
See issue #6310: https://github.com/apache/incubator-nuttx/issues/6310

I had to disable it in our application, as it causes lots of problems.

As I see, there are a couple of propositions on fixing this, but none of
them is merged yet.

I would like to also express interest in this getting fixed. It is
important for our application.
Maybe the fix should be included in the upcoming release?

On Mon, Sep 5, 2022 at 11:24 PM Nathan Hartman <ha...@gmail.com>
wrote:

> Has anyone ended up in __stack_chk_fail() from nxsig_nanosleep()?
>
> I am running ostest on tiva and consistently getting a panic in the
> "priority_inheritance: Restoration Test" (priority_inheritance() in
> apps/testing/ostest/prioinherit.c).
>
> It always happens in the same place:
>
> The above-mentioned function creates 3 tasks called Task0, Task1,
> Task2; grep for NUMBER_OF_COMPETING_THREADS to find the place where
> these three tasks are started.
>
> The task main function is adversary() in the same file; adversary()
> looks like this:
>
> static int adversary(int argc, FAR char *argv[])
> {
>   int index        = atoi(argv[1]);
>   int inital_delay = atoi(argv[2]);
>   int hold_delay   = atoi(argv[3]);
>
>   sleep_and_display(index, inital_delay);
>   printf("priority_inheritance: "
>          "%s Started, waiting %d uS to take count\n", argv[0],
> inital_delay);
>   sem_wait(&g_sem);
>   sleep_and_display(index,  hold_delay);
>   sem_post(&g_sem);
>   printf("priority_inheritance: %s Posted\n", argv[0]);
>   sleep_and_display(index, 0);
>   return 0;
> }
>
> It runs sleep_and_display(), which calls usleep(), which eventually
> calls nxsig_nanosleep(). When nxsig_nanosleep() finishes and wants to
> return, the stack smashing protection kicks in and we end up in
> __stack_chk_fail(). I think this is happening in Task2() or is in some
> way connected to context switching between Task2() and another task
> due to the sleep.
>
> So far I can't seem to figure out why.
>
> Thanks,
> Nathan
>