You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@mynewt.apache.org by Christopher Collins <ch...@runtime.io> on 2018/10/18 18:13:27 UTC

Unlikely recoverable failures

Hello all,

I think Mynewt lacks a mechanism for dealing with a certain class of
errors: unlikely recoverable failures.  For the purposes of this email,
I divide failures into two groups:

    1. Failures we cannot recover from, or that we don't want to recover
       from.

This group includes: immediate bugs in the code, memory corruption,
software watchdog expiry, etc.

The best way to handle these failures is to crash immediately, causing
gdb to hit a breakpoint and / or creating a core dump.  We already have
a tool for this: assert().

    2. Failures we want to recover from in production, but not always
       during development.

This group includes: failure of nonessential hardware, dynamic memory
allocation failures, file system write failures due to insufficient
space, etc.

For this second group, I think we need a "crash()" macro that can be
enabled or disabled at compile time[1].  During development and debugging,
the macro would crash the system.  In production builds, the macro would
expand to nothing, allowing the recovery code to execute.

I would like to implement such a macro in mynewt-core.  Here are some of
my thoughts on the subject.  As always, please don't hesitate to express
any criticism.

A. No conditional.

Unlike `assert()`, this new macro should not evaluate an expression to
determine success or failure.  Instead, the calling code should detect
failure with an `if` statement or similar, and only invoke the macro in
the failure case.  Since these failures are expected, I think it is
likely that the application will always want to execute some code in the
failure case, regardless of whether the macro is configured to trigger a
crash.  For example: logging a message to the console when the failure
is detected.  If the macro itself detects the failure, the application
doesn't have the opportunity to execute any code before the crash.

B. No severity.

Initially, I was thinking we could have a severity associated with each
failure.  The new macro would only trigger a crash if invoked with a
severity less than or equal to some `PANIC_LEVEL` setting.  However, I
decided that this just complicates the feature without adding any real
value.  I can't think of a good way to assign a severity to each
failure.  If we ever need this, we can add it on top of the
single-severity implementation.

I do think the ability to enable this feature per-package would be
useful, so that is something to consider for the future.

C. Name?

This is what has been causing me the most agony: what to name the macro.
The names I hate the least are:

    * DBG_PANIC() // "Debug panic"
    * DEV_PANIC() // "Development panic"

But I am not crazy about these.  Any other suggestions?  Even though
all-caps is ugly, I do think it should be used here to make it obvious
that this construct does very macro-like things (inspects the file and
line number, possibly expands to nothing, etc.).

Thanks,
Chris

[1] `assert()` can be enabled and disabled via the `NDEBUG` macro.
However, I am of the opinion that asserts are too useful to ever
disable, so I would like an additional level of configurability here.

Re: Unlikely recoverable failures

Posted by will sanfilippo <wi...@runtime.io>.
Chris:

Sorry was away so did not comment on this prior to your implementation. All sounds good; I like DEBUG_PANIC for the name (I liked it better than DEV_PANIC).

Will


> On Oct 19, 2018, at 11:14 AM, Christopher Collins <ch...@runtime.io> wrote:
> 
> FYI- I have submitted a PR that implements this new macro:
> https://github.com/apache/mynewt-core/pull/1471
> 
> I went with `DEBUG_PANIC()` for the name.
> 
> Chris
> 
> On Thu, Oct 18, 2018 at 11:13:27AM -0700, Christopher Collins wrote:
>> Hello all,
>> 
>> I think Mynewt lacks a mechanism for dealing with a certain class of
>> errors: unlikely recoverable failures.  For the purposes of this email,
>> I divide failures into two groups:
>> 
>>    1. Failures we cannot recover from, or that we don't want to recover
>>       from.
>> 
>> This group includes: immediate bugs in the code, memory corruption,
>> software watchdog expiry, etc.
>> 
>> The best way to handle these failures is to crash immediately, causing
>> gdb to hit a breakpoint and / or creating a core dump.  We already have
>> a tool for this: assert().
>> 
>>    2. Failures we want to recover from in production, but not always
>>       during development.
>> 
>> This group includes: failure of nonessential hardware, dynamic memory
>> allocation failures, file system write failures due to insufficient
>> space, etc.
>> 
>> For this second group, I think we need a "crash()" macro that can be
>> enabled or disabled at compile time[1].  During development and debugging,
>> the macro would crash the system.  In production builds, the macro would
>> expand to nothing, allowing the recovery code to execute.
>> 
>> I would like to implement such a macro in mynewt-core.  Here are some of
>> my thoughts on the subject.  As always, please don't hesitate to express
>> any criticism.
>> 
>> A. No conditional.
>> 
>> Unlike `assert()`, this new macro should not evaluate an expression to
>> determine success or failure.  Instead, the calling code should detect
>> failure with an `if` statement or similar, and only invoke the macro in
>> the failure case.  Since these failures are expected, I think it is
>> likely that the application will always want to execute some code in the
>> failure case, regardless of whether the macro is configured to trigger a
>> crash.  For example: logging a message to the console when the failure
>> is detected.  If the macro itself detects the failure, the application
>> doesn't have the opportunity to execute any code before the crash.
>> 
>> B. No severity.
>> 
>> Initially, I was thinking we could have a severity associated with each
>> failure.  The new macro would only trigger a crash if invoked with a
>> severity less than or equal to some `PANIC_LEVEL` setting.  However, I
>> decided that this just complicates the feature without adding any real
>> value.  I can't think of a good way to assign a severity to each
>> failure.  If we ever need this, we can add it on top of the
>> single-severity implementation.
>> 
>> I do think the ability to enable this feature per-package would be
>> useful, so that is something to consider for the future.
>> 
>> C. Name?
>> 
>> This is what has been causing me the most agony: what to name the macro.
>> The names I hate the least are:
>> 
>>    * DBG_PANIC() // "Debug panic"
>>    * DEV_PANIC() // "Development panic"
>> 
>> But I am not crazy about these.  Any other suggestions?  Even though
>> all-caps is ugly, I do think it should be used here to make it obvious
>> that this construct does very macro-like things (inspects the file and
>> line number, possibly expands to nothing, etc.).
>> 
>> Thanks,
>> Chris
>> 
>> [1] `assert()` can be enabled and disabled via the `NDEBUG` macro.
>> However, I am of the opinion that asserts are too useful to ever
>> disable, so I would like an additional level of configurability here.


Re: Unlikely recoverable failures

Posted by Christopher Collins <ch...@runtime.io>.
FYI- I have submitted a PR that implements this new macro:
https://github.com/apache/mynewt-core/pull/1471

I went with `DEBUG_PANIC()` for the name.

Chris

On Thu, Oct 18, 2018 at 11:13:27AM -0700, Christopher Collins wrote:
> Hello all,
> 
> I think Mynewt lacks a mechanism for dealing with a certain class of
> errors: unlikely recoverable failures.  For the purposes of this email,
> I divide failures into two groups:
> 
>     1. Failures we cannot recover from, or that we don't want to recover
>        from.
> 
> This group includes: immediate bugs in the code, memory corruption,
> software watchdog expiry, etc.
> 
> The best way to handle these failures is to crash immediately, causing
> gdb to hit a breakpoint and / or creating a core dump.  We already have
> a tool for this: assert().
> 
>     2. Failures we want to recover from in production, but not always
>        during development.
> 
> This group includes: failure of nonessential hardware, dynamic memory
> allocation failures, file system write failures due to insufficient
> space, etc.
> 
> For this second group, I think we need a "crash()" macro that can be
> enabled or disabled at compile time[1].  During development and debugging,
> the macro would crash the system.  In production builds, the macro would
> expand to nothing, allowing the recovery code to execute.
> 
> I would like to implement such a macro in mynewt-core.  Here are some of
> my thoughts on the subject.  As always, please don't hesitate to express
> any criticism.
> 
> A. No conditional.
> 
> Unlike `assert()`, this new macro should not evaluate an expression to
> determine success or failure.  Instead, the calling code should detect
> failure with an `if` statement or similar, and only invoke the macro in
> the failure case.  Since these failures are expected, I think it is
> likely that the application will always want to execute some code in the
> failure case, regardless of whether the macro is configured to trigger a
> crash.  For example: logging a message to the console when the failure
> is detected.  If the macro itself detects the failure, the application
> doesn't have the opportunity to execute any code before the crash.
> 
> B. No severity.
> 
> Initially, I was thinking we could have a severity associated with each
> failure.  The new macro would only trigger a crash if invoked with a
> severity less than or equal to some `PANIC_LEVEL` setting.  However, I
> decided that this just complicates the feature without adding any real
> value.  I can't think of a good way to assign a severity to each
> failure.  If we ever need this, we can add it on top of the
> single-severity implementation.
> 
> I do think the ability to enable this feature per-package would be
> useful, so that is something to consider for the future.
> 
> C. Name?
> 
> This is what has been causing me the most agony: what to name the macro.
> The names I hate the least are:
> 
>     * DBG_PANIC() // "Debug panic"
>     * DEV_PANIC() // "Development panic"
> 
> But I am not crazy about these.  Any other suggestions?  Even though
> all-caps is ugly, I do think it should be used here to make it obvious
> that this construct does very macro-like things (inspects the file and
> line number, possibly expands to nothing, etc.).
> 
> Thanks,
> Chris
> 
> [1] `assert()` can be enabled and disabled via the `NDEBUG` macro.
> However, I am of the opinion that asserts are too useful to ever
> disable, so I would like an additional level of configurability here.