You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@mynewt.apache.org by Christopher Collins <ch...@runtime.io> on 2019/02/28 02:08:19 UTC

Mynewt test facilities

Hello all,

In this email, I would like to discuss Mynewt's test facilities.

### INTRO (AKA TL;DR)

Unit tests come in two flavors: 1) simulated, and 2) real hardware.
Writing a test capable of running in both environments is wasted
effort.  Let's revisit Mynewt's test facilities with a clear
understanding that they must solve two different use cases.

In this email, I want to:
    1. Clarify the differences between the two test environments.
    2. Propose some high-level changes to Mynewt's test facilities.

### TEST ENVIRONMENTS

The two test environments are summarized below.

##### SIMULATED

Simulated tests ("self tests"):
    * Are executed with `newt test`.
    * Automatically execute a set of tests.  No user intervention
      required.
    * Report results via stdout/stderr and exit status.
    * Terminate when the tests are complete.

A typical self test is a very basic application.  Usually, a self test
contains the package-under-test, the testutil package, and not much
else.  Most self tests don't even include the OS scheduler; they are
simply a single-threaded sequence of test cases.  For these simple
tests, `sysinit()` can be executed between each test case, obviating
the need for "setup" and "teardown" code.  Self tests are allowed to
crash; such crashes get reported as test failures.  Finally, self tests
have effectively unlimited memory at their disposal.

##### REAL HARDWARE

Test apps that run on real hardware ("hw tests"):
    * Run a test server.  The user executes tests on demand with the
      newtmgr `run` command.
    * Typically contain test suites from several packages.
    * Record results to a Mynewt log.
    * Run indefinitely.  The device does not reset itself between
      tests.

Hw tests are more complex than self tests.  These test apps need to
include the scheduler, the newtmgr server, a transport (e.g., BLE), and
a BSP and all its drivers.  Test code cannot assume a clean state at
the start of each test.  Instead, the code must perform manual clean up
after each test case.  Hw test cases must be idempotent and they must
not crash.  Finally, memory is constrained in this environment.

##### PREFER SELF

Clearly, self tests are easier to write than hw tests.  Equally
important, self tests are easier to run - they can run in automated CI
systems without the need for a complicated test setup.

So self tests should always be preferred over hw tests.  Hw tests should
only be used when necessary, i.e., when testing drivers or the system
as a whole.  I won't go so far as to say there is never a reason to run
the same test in both environments, but it is so rarely needed that it
is OK if it requires some extra effort from the developer.

### TEST FACILITIES

Mynewt has two unit test libraries: `testutil` and `runtest`.

##### TESTUTIL

Testutil is a primitive unit test framework.  There really isn't
anything novel about this library - suites, cases, pass, fail, you get
the idea :).  This library is used in both test environments.

I have one concern about this library:

    In self tests, does `sysinit()` get called automatically at the
    start of each test case?

Currently, the answer is no.

##### RUNTEST

Runtest is a grab bag of "other" test functionality:

    1. Command handler for the `run` newtmgr command.
    2. API for logging test results.
    3. Generic task creation facility for multithreaded tests.

Features 1 and 2 are only needed by hw tests.  Feature 3 is needed by
both self tests and hw tests.

### PROPOSALS

1. (testutil): Modify `TEST_CASE()` to call
`sysinit()` if `MYNEWT_VAL(SELFTEST)` is enabled.

2. (runtest): Remove the task-creation functionality from `runtest`.
This functionality can go in a new library (call it "taskpool" for the
sake of discusion).

3. (taskpool) Further, I think the taskpool functionality could be
tailored specifically towards unit tests.  The remainder of this email
deals with this proposal

I will use an example as motivation for this library.  Here is a simple
test which uses our to-be-defined taskpool library.  This example tests
that a `cbmem` can handle multiple tasks writing to it at the same
time.  Note: this is not meant to be good test code :).

    static struct cbmem cbm;

    /**
     * Low-priority task handler.  Writes 20 entries to the cbmem.
     * Uses a rate of 1 OS tick per write.
     */
    static void
    task_lo(void *arg)
    {
        int i;

        for (i = 0; i < 20; i++) {
            cbmem_append(&cbm, "hello from task_lo", 18);
            os_time_delay(1);
        }
    }

    /**
     * High-priority task handler.  Writes 10 entries to the cbmem.
     * Uses a rate of 2 OS ticks per write.
     */
    static void
    task_hi(void *arg)
    {
        int i;

        for (i = 0; i < 10; i++) {
            cbmem_append(&cbm, "hello from task_hi", 18);
            os_time_delay(2);
        }
    }

    TEST_CASE(cbmem_thread_safety) {
        /* Initialize cbmem. */
        uint8_t buf[1000];
        cbmem_init(&cbm, buf, sizeof buf);

        /* Create and run the two test tasks. */
        taskpool_create(task_hi, 10 /* Priority */);
        taskpool_create(task_lo, 20 /* Priority */);

        /* Wait for both tasks to complete.  Fail after three seconds. */
        taskpool_wait(3 * OS_TICKS_PER_SEC);

        /* Test result automatically reported here. */
    }

This example illustrates a few requirements of taskpool:

    1) A taskpool task (tp-task) is allowed to "run off" the end of its
       handler function.  Normally, it is a fatal error if a Mynewt
       task handler returns.

    2) Client code can block until all tp-tasks have terminated.

In multithreaded test code, tasks are typically short-lived and simple.
Mynewt's test facilities should make it easy to create such tasks.

There is one more requirement that isn't expressed in the example, but
which is important:

    3) Client code can abort all tp-tasks before they return.  This is
       needed when a fatal test failure occurs.

I have some ideas for how this could be implemented, but I think that
is best left for another email (or a PR!).  Any concerns or ideas about
implementation are welcome, but I am particularly interested in high
level criticism:

    * Does this seem useful for unit testing?
    * Is any major functionality missing?
    * Any completely different ideas that are better than this one?

Thanks,
Chris

Re: Mynewt test facilities

Posted by Christopher Collins <ch...@runtime.io>.
Thanks, Will.

Responses to inline comments inline :).

On Sun, Mar 03, 2019 at 06:18:46PM -0800, will sanfilippo wrote:
> > ### PROPOSALS
> > 
> > 1. (testutil): Modify `TEST_CASE()` to call
> > `sysinit()` if `MYNEWT_VAL(SELFTEST)` is enabled.
> Would we want all TEST_CASE() to call sysinit if a single syscfg is
> defined? Maybe for some tests you want it and some you dont?

Before I address this, I want to clarify my proposal a bit.  I did a
poor job of setting up the context, so I'll try again here.  Also, you
would get tired of seeing "IMHO" at the end of every sentence, so I
won't write it.  Please just understand that all the below is just my
opinion :).

Test cases should be as self-contained as possible.  We don't want any
restrictions on how test cases are ordered.  For example, something like
this is bad (hypothetical):

    /* Test basic FCB functionality. */
    test_case_fcb_basic();

    /* The above test initialized and populated the FCB.  Make sure we
     * can delete an entry from it.
     */
    test_case_fcb_delete();

The ordering of these test cases matters.  The second case depends on
side effects from the first case.  These kinds of tests are difficult to
maintain, and often it is difficult to tell what is even being tested.

Every test case should start from a clean state.  If
`test_case_fcb_delete()` needs a pre-populated FCB, it should start by
initializing and populating an FCB with the exact contents it needs.

This is why it is good to execute `sysinit()` at the start of each test
case - it lets us clean up state left over from prior test cases.  This
eliminates a lot of manual clean up code, and it helps ensure
correctness of old tests as new ones are added.

Ideally we could just call `sysinit()` at the start of each test case,
whether the test is simulated or performed on real hardware.
Unfortunately, we can't call `sysinit()` in hw tests.  The problem is
that hw tests specifically need to maintain some state between test
cases.  Calling `sysinit()` in a hw test case would reset the runtest
package, reinitialize the test result logs, and otherwise make the
system unusable as a test server.

So I think we should call `sysinit()` at the start of a test case
whenever possible.  In other words, call `sysinit()` for every self
test.  I don't think we should make this behavior conditional on a
syscfg setting, but as I said, I am totally open to opposing viewpoints.

I do think you raise a good point, though.  The semantics of
`TEST_CASE()` should not differ between self tests and hw tests.
Instead, we should create a new macro which creates a test case and
calls `sysinit()` at the top.

That is:

    `TEST_CASE()` - creates a test case that does NOT call `sysinit()`.
    `TEST_CASE_SELF()` - creates a test case that DOES call `sysinit()`.

(macro name off the top of my head.)

Self tests would use `TEST_CASE_SELF()`; hw tests would use
`TEST_CASE()`.

At first, I thought it would be confusing to have two macros.  Now I
think it is even more confusing to have one macro with varying
semantics.

I have some more changes I would like to make to the testutil API, but I
will save that for a future email (or PR).

> > This example illustrates a few requirements of taskpool:
> > 
> >    1) A taskpool task (tp-task) is allowed to "run off" the end of its
> >       handler function.  Normally, it is a fatal error if a Mynewt
> >       task handler returns.
> > 
> Not sure why I do not like this but is there some need to have a task
> basically return?

It is not strictly needed, but it is a convenient way for a task to
signal that it has completed.  Otherwise, each of these temporary tasks
would need to set some condition variable and keep looping.  That isn't
terribly complicated, but I do think it is more complicated than this
needs to be.  These tasks are not meant to loop endlessly like most, so
why require them to be written that way?

Just to clarify - I am not proposing a change to the kernel scheduler.
The handlers for these tp-tasks would just be wrapped with something
that provides this special behavior.

> Seem fairly reasonable. Is taskpool_create() designed to save the test
> developer some time creating tasks or does it have some other
> functionality?

Its main purpose is to allow tasks to be reused among a set of test
cases.  This isn't so important for self tests, but in hw tests, it is
wasteful for each test case to allocate its own pool of tasks and
stacks.  Since only one test runs at a time, these test cases can share
a pool of tasks.

Thanks,
Chris

Re: Mynewt test facilities

Posted by will sanfilippo <wi...@runtime.io>.
Some comments inline. 

> On Feb 27, 2019, at 6:08 PM, Christopher Collins <ch...@runtime.io> wrote:
> 
> Hello all,
> 
> In this email, I would like to discuss Mynewt's test facilities.
> 
> ### INTRO (AKA TL;DR)
> 
> Unit tests come in two flavors: 1) simulated, and 2) real hardware.
> Writing a test capable of running in both environments is wasted
> effort.  Let's revisit Mynewt's test facilities with a clear
> understanding that they must solve two different use cases.
> 
> In this email, I want to:
>    1. Clarify the differences between the two test environments.
>    2. Propose some high-level changes to Mynewt's test facilities.
> 
> ### TEST ENVIRONMENTS
> 
> The two test environments are summarized below.
> 
> ##### SIMULATED
> 
> Simulated tests ("self tests"):
>    * Are executed with `newt test`.
>    * Automatically execute a set of tests.  No user intervention
>      required.
>    * Report results via stdout/stderr and exit status.
>    * Terminate when the tests are complete.
> 
> A typical self test is a very basic application.  Usually, a self test
> contains the package-under-test, the testutil package, and not much
> else.  Most self tests don't even include the OS scheduler; they are
> simply a single-threaded sequence of test cases.  For these simple
> tests, `sysinit()` can be executed between each test case, obviating
> the need for "setup" and "teardown" code.  Self tests are allowed to
> crash; such crashes get reported as test failures.  Finally, self tests
> have effectively unlimited memory at their disposal.
> 
> ##### REAL HARDWARE
> 
> Test apps that run on real hardware ("hw tests"):
>    * Run a test server.  The user executes tests on demand with the
>      newtmgr `run` command.
>    * Typically contain test suites from several packages.
>    * Record results to a Mynewt log.
>    * Run indefinitely.  The device does not reset itself between
>      tests.
> 
> Hw tests are more complex than self tests.  These test apps need to
> include the scheduler, the newtmgr server, a transport (e.g., BLE), and
> a BSP and all its drivers.  Test code cannot assume a clean state at
> the start of each test.  Instead, the code must perform manual clean up
> after each test case.  Hw test cases must be idempotent and they must
> not crash.  Finally, memory is constrained in this environment.
> 
> ##### PREFER SELF
> 
> Clearly, self tests are easier to write than hw tests.  Equally
> important, self tests are easier to run - they can run in automated CI
> systems without the need for a complicated test setup.
> 
> So self tests should always be preferred over hw tests.  Hw tests should
> only be used when necessary, i.e., when testing drivers or the system
> as a whole.  I won't go so far as to say there is never a reason to run
> the same test in both environments, but it is so rarely needed that it
> is OK if it requires some extra effort from the developer.
> 
> ### TEST FACILITIES
> 
> Mynewt has two unit test libraries: `testutil` and `runtest`.
> 
> ##### TESTUTIL
> 
> Testutil is a primitive unit test framework.  There really isn't
> anything novel about this library - suites, cases, pass, fail, you get
> the idea :).  This library is used in both test environments.
> 
> I have one concern about this library:
> 
>    In self tests, does `sysinit()` get called automatically at the
>    start of each test case?
> 
> Currently, the answer is no.
> 
> ##### RUNTEST
> 
> Runtest is a grab bag of "other" test functionality:
> 
>    1. Command handler for the `run` newtmgr command.
>    2. API for logging test results.
>    3. Generic task creation facility for multithreaded tests.
> 
> Features 1 and 2 are only needed by hw tests.  Feature 3 is needed by
> both self tests and hw tests.
> 
> ### PROPOSALS
> 
> 1. (testutil): Modify `TEST_CASE()` to call
> `sysinit()` if `MYNEWT_VAL(SELFTEST)` is enabled.
Would we want all TEST_CASE() to call sysinit if a single syscfg is defined? Maybe for some tests you want it and some you dont?

> 
> 2. (runtest): Remove the task-creation functionality from `runtest`.
> This functionality can go in a new library (call it "taskpool" for the
> sake of discusion).
> 
> 3. (taskpool) Further, I think the taskpool functionality could be
> tailored specifically towards unit tests.  The remainder of this email
> deals with this proposal
> 
> I will use an example as motivation for this library.  Here is a simple
> test which uses our to-be-defined taskpool library.  This example tests
> that a `cbmem` can handle multiple tasks writing to it at the same
> time.  Note: this is not meant to be good test code :).
> 
>    static struct cbmem cbm;
> 
>    /**
>     * Low-priority task handler.  Writes 20 entries to the cbmem.
>     * Uses a rate of 1 OS tick per write.
>     */
>    static void
>    task_lo(void *arg)
>    {
>        int i;
> 
>        for (i = 0; i < 20; i++) {
>            cbmem_append(&cbm, "hello from task_lo", 18);
>            os_time_delay(1);
>        }
>    }
> 
>    /**
>     * High-priority task handler.  Writes 10 entries to the cbmem.
>     * Uses a rate of 2 OS ticks per write.
>     */
>    static void
>    task_hi(void *arg)
>    {
>        int i;
> 
>        for (i = 0; i < 10; i++) {
>            cbmem_append(&cbm, "hello from task_hi", 18);
>            os_time_delay(2);
>        }
>    }
> 
>    TEST_CASE(cbmem_thread_safety) {
>        /* Initialize cbmem. */
>        uint8_t buf[1000];
>        cbmem_init(&cbm, buf, sizeof buf);
> 
>        /* Create and run the two test tasks. */
>        taskpool_create(task_hi, 10 /* Priority */);
>        taskpool_create(task_lo, 20 /* Priority */);
> 
>        /* Wait for both tasks to complete.  Fail after three seconds. */
>        taskpool_wait(3 * OS_TICKS_PER_SEC);
> 
>        /* Test result automatically reported here. */
>    }
> 
> This example illustrates a few requirements of taskpool:
> 
>    1) A taskpool task (tp-task) is allowed to "run off" the end of its
>       handler function.  Normally, it is a fatal error if a Mynewt
>       task handler returns.
> 
Not sure why I do not like this but is there some need to have a task basically return?

>    2) Client code can block until all tp-tasks have terminated.
> 
> In multithreaded test code, tasks are typically short-lived and simple.
> Mynewt's test facilities should make it easy to create such tasks.
> 
> There is one more requirement that isn't expressed in the example, but
> which is important:
> 
>    3) Client code can abort all tp-tasks before they return.  This is
>       needed when a fatal test failure occurs.
> 
> I have some ideas for how this could be implemented, but I think that
> is best left for another email (or a PR!).  Any concerns or ideas about
> implementation are welcome, but I am particularly interested in high
> level criticism:
> 
>    * Does this seem useful for unit testing?
>    * Is any major functionality missing?
>    * Any completely different ideas that are better than this one?
> 
Seem fairly reasonable. Is taskpool_create() designed to save the test developer some time creating tasks or does it have some other functionality?

> Thanks,
> Chris