You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@tvm.apache.org by Christopher Sidebottom via Apache TVM Discuss <no...@discuss.tvm.ai> on 2021/05/11 13:21:37 UTC

[Apache TVM Discuss] [RFC] [uTVM] Embedded C Runtime Interface


# Summary

This RFC outlines a set of additional APIs for the C Runtime to enable direct calling of an AOT micro entrypoint (https://discuss.tvm.apache.org/t/rfc-utvm-aot-optimisations-for-embedded-targets/9849) from a model descriptor which includes some model metadata, this is an alternative to the packed function API when working in embedded environments.
 
```c
typedef struct {
	...metadata...,
	(TVMMicroEntryPoint*) entrypoint
} TVMModel; // Model descriptor to be used in static linkage
typedef struct {
	...,
	void** workspace;
} TVMContext; // Context configuration for minimal environments

// Execution function to execute a model in a given context
static inline int32_t TVMExecute(const TVMModel* model, void** inputs, void** outputs, TVMContext* context);
// Workspace setup function to assign the workspace to the context
static inline void TVMSetWorkspaces(TVMContext* context, void** workspace);
// Workspace size retrieval
static inline size_t TVMGetWorkspaceSize(const TVMModel* model, size_t workspace_index);
```

# Motivation

As illustrated by @stoa in https://discuss.tvm.apache.org/t/rfc-standalone-code-generation-and-c-runtime-for-stm32-bare-metal-devices/9562, an embedded specific entrypoint into TVM is desired and in order to access AOT from an embedded environment, it makes sense to provide a stable user facing API so as underlying changes in the output model can be transparent to system integrators. Providing stable interfaces for the facilities of the existing C runtime to an embedded environment provides similar guarantees and ease of use for those not using the packed function signature in TVM. This also provides TVM developers the ability to change the underlying micro runtime as TVM evolves with a stable outward facing interface.

One of the principles of the micro entrypoint is that it is providing a minimal amount of overhead when running in an embedded system, therefore a similarly minimal way to run a simple model is introduced which can be augmented by the wider C Runtime.

# Guide-level explanation

This RFC aims to introduce the concepts to call the AOT micro entrypoint from an embedded application, as a starting point this proposal includes:

* A model descriptor to give richer information about the model and wrap the micro entrypoint
* A model context to store embedded environment information
* Initial functions for managing memory workspaces

A user can include these as additional headers to allow a thin and stable interface for the AOT execution entrypoint, instead of having:

*user_app.c*
```c
extern const TVMModel my_model;
my_model->entrypoint(inputs, outputs, my_context);
```

And having to understand the calling pattern of the AOT output, they can instead use:

*user_app.c*
```c
#include "tvm_micro_runtime.h"
extern const TVMModel my_model;
TVMExecute(&my_model, inputs, outputs, &my_context);
```

This would be achieved by using minimal inline functions to mask the internal structure of TVMModel, such as:

*tvm_micro_runtime.h*
```c
#include "tvm_micro_backend.h"
static inline int32_t TVMExecute(TVMModel* model, void** inputs, void** outputs, TVMContext* context) {
	return model->entrypoint(inputs, outputs, context);
}
```
*tvm_micro_backend.h*
```
typedef struct {
	...metadata...,
	(TVMMicroEntryPoint*) entrypoint
} TVMModel; // Model descriptor to be used in static linkage
typedef struct {
	...,
	void** workspace;
} TVMContext; // Context configuration for minimal environments
```

You can see this in two motivating user flows, compiling a model and then augmenting it with application level memory management.

## Default Model Compilation

![](https://confluence.arm.com/download/attachments/759974179/structurizr-barebones.png?version=1&modificationDate=1619713091765&api=v2)

In this flow, the user is using tvmc to generate a model and an associated block of memory is allocated for it:

`tvmc my_model.tflite --executor=aot --target=c --no-typed-operators --micro-entrypoint`

For this flow, no additional context is required and the user can run the code on their device:

```c
extern const TVMModel my_model;
void* inputs = {my_data};
void* outputs = {output_space};
TVMExecute(&my_model, inputs, outputs, NULL);
```

This is enabled by the use of of the a TVMModel structure generated by TVM to expose the AOT resources which can be constant and provided by the compiler output with relevant metadata for users to query.

## Custom-Workspace Compilation

![](https://confluence.arm.com/download/attachments/759974179/structurizr-baremetal.png?version=3&modificationDate=1619713102527&api=v2)

In this flow, the user is using tvmc to generate a model but specifies the memory available:

`tvmc my_model.tflite --executor=aot --target=c --no-typed-operators --micro-entrypoint --with-memory=size=2048;access=rw`

For this flow, the additional context is required to allow telling the runtime where the memory exists:

```c
extern const TVMModel my_model;
TVMContext context;

void* inputs = {my_data};
void* outputs = {output_space};

TVMSetWorkspaces(&context, malloc(TVMGetWorkspaceSize(model, 0));
TVMExecute(&my_model, inputs, outputs, context);
```

This works because of the context which the model runs within, similar to the DLContext object but providing only information not hardcoded into the AOT output for a minimal runtime. By re-using the resource_handle pointer, the embedded context can also be used for operators run using packed functions and normal TVM buffers.

# Reference-level explanation

In this RFC, we are primarily concerned with three areas; a model descriptor which the compiler generates, a context which the user can manipulate and an API file which binds the two together.

## Model Descriptor

This is a formalisation of the model descriptor found in tvm/runtime/crt/internal/aot_executor/aot_executor.h, which can be used to describe a model via the APIs proposed:

```c
typedef struct {
  uint32_t num_input_tensors;    /** Number of expected input tensors */
  uint32_t num_output_tensors;   /** Number of expected output tensors */
  size_t* workspace_size;         /** Size of workspace required for model to run */
  TVMMicroEntryPoint entrypoint; /** Generated model function, called through tvm_runtime_run */
} TVMModel;
```

This is the generated fixed model descriptor which users can address by name in the outputted code:

`extern const TVMModel my_model;`

Additional fields can be added here alongside suitable getters to retrieve information about a model. Notably, if the workspace isn't specified by the user, it'll default to being pinned within the generated code rather than being user accessible.

## Context

Paired with the model descriptor, this provides any contextual information required to run the model, such as an application driven workspace configuration:

```
typedef struct {
	void** workspace; /** Pointers to different memory to use as a workspace */
} TVMContext;
```

## Micro Entrypoint Runtime API

A header which can be added to the `src/runtime` folder alongside `c_backend_api.h` and `c_runtime_api.h` to provide the correct overlay to the matching C runtime. Using `static inline` functions each of the individual calls can be kept minimal and provide abstraction on top of the underlying model:

```c
static inline int32_t TVMExecute(const TVMModel* model, void** inputs, void** outputs, TVMContext* context) {
	return model->entrypoint(inputs, outputs, context);
}
static inline size_t TVMGetWorkspaceSize(const TVMModel* model, size_t workspace_index) {
    return model->workspace_size[workspace_index];
}
static inline void TVMSetWorkspaces(TVMContext* context, void** workspace) {
	context->workspace = workspace;
}
```

# Drawbacks

This starts to build up a minimal interface for interacting with TVM, which deviates from the main dynamic linked approach. It's important to keep this layer as minimal as possible to allow other parts of TVM to continue doing the heavy lifting.

Combining this with the core C runtime means maintaining support across an incredibly broad range of devices from single core embedded devices to cloud environments and dynamically loading for autotuning.

# Rationale and alternatives

Integrating with the current C Runtime gives us a way to assess and move forwards with embedded specific changes but alternatively a different runtime environment could be created entirely, this would mean reinventing every aspect of the runtime and would not leverage as much of the existing work.

# Prior art

* The setting up of an application workspace for a TVM model was first demonstrated in https://discuss.tvm.apache.org/t/rfc-standalone-code-generation-and-c-runtime-for-stm32-bare-metal-devices/9562
* AOT introduced the concept of a model descriptor in tvm/runtime/crt/internal/aot_executor/aot_executor.h [within it's introductory PR](https://github.com/apache/tvm/pull/7785)

# Unresolved questions

* Is this lightweight enough to allow usage of the C Runtime where useful to embedded applications?
* Should we use the common C snake case style to better match embedded systems rather than the style used by the C runtime?

# Future possibilities

By integrating and evolving the C Runtime API, TVM can be targeted across a broader range of targets than is currently possible with the current API. This section outlines some of the use cases we could extend this into, these are intended to be illustrative and will require their own RFCs.

## Re-use of C Runtime APIs

Using the C runtime provides access to standard interfaces such as multithreading, with an RTOS specific implementation

![](https://confluence.arm.com/download/attachments/759974179/structurizr-multithreaded.png?version=1&modificationDate=1619712082407&api=v2)

## Further Model Metadata

Any additional metadata can be added to the `TVMModel` structure as a minimal overhead, allowing for extension into a variety of use cases.

```
static inline int32_t TVMGetTVMVersionMajor(const TVMModel* model)  {
	return model->compiler_version_major;
}
```

## Shared Workspaces

In this flow, the user disables the generation of a default memory block so as to allow for the application to define that memory:

`tvmc my_model1.tflite --executor=aot --target=c --no-typed-operators --micro-entrypoint --with-memory=size=2048;access=rw`

`tvmc my_model2.tflite --executor=aot --target=c --no-typed-operators --micro-entrypoint --with-memory=size=4096;access=rw`

And this can be loaded into the context for the executor to get the memory from:

```
TVMContext my_context;
size_t max_workspace_required = max(TVMGetWorkspaceSize(my_model1, 0), TVMGetWorkspaceSize(my_model2, 0));
TVMSetWorkspace(&my_context, malloc(max_workspace_required));
```

## RTOS Device Integration

![](https://confluence.arm.com/download/attachments/759974179/structurizr-accelerator.png?version=1&modificationDate=1619712145836&api=v2)

The context object can be defined per-platform to allow RTOS specific structures to be passed through to the operators:

```c
struct device* my_accel = device_get_binding("ACC_0");
TVMSetDevice(&my_context, my_accel);
```

With an associated header-only platform wrapper, here is an example for the Zephyr RTOS:

```c
#include <device.h>

typedef struct {
  void** workspace;
  struct device* device;
} TVMContext;

static inline void TVMSetDevice(TVMContext* context,  struct device* device) { 
   context->device = device;
}
```

Alongside new device drivers, this can provide an interface for operators to interact with RTOS drivers directly in the C runtime:

```c
void TVMAcceleratorAccelerate(TVMContext* context, int32_t operation) {
	struct device* device = context->device;
	device_specific_rtos_call(device, operation);
}
```

## Parameter Updating

By starting to uncover the alternative pathway into a more static execution environment we can start to provide methods of overwriting aspects of the model such as overwriting existing in-memory parameters:

```c
static inline void TVMSetParameters(TVMContext* context, void** params) {
	context->params = params;
}
```

Which can then provide the potential for Over-the-Air updates of models to IoT devices.





---
[Visit Topic](https://discuss.tvm.apache.org/t/rfc-utvm-embedded-c-runtime-interface/9951/1) to respond.

You are receiving this because you enabled mailing list mode.

To unsubscribe from these emails, [click here](https://discuss.tvm.apache.org/email/unsubscribe/c6ddffcf85e327478c02f4ade5a0ddafd85fa9d4e63dc6f1f123d5ad27522ae0).

[Apache TVM Discuss] [RFC] [uTVM] Embedded C Runtime Interface

Posted by Christopher Sidebottom via Apache TVM Discuss <no...@discuss.tvm.ai>.


cc: @areusch @giuseros @stoa @manupa-arm @grant-arm





---
[Visit Topic](https://discuss.tvm.apache.org/t/rfc-utvm-embedded-c-runtime-interface/9951/2) to respond.

You are receiving this because you enabled mailing list mode.

To unsubscribe from these emails, [click here](https://discuss.tvm.apache.org/email/unsubscribe/628ae85d891202d8fe1574169e60144e12767eb5009f2f1aae73bcb00da0b320).

[Apache TVM Discuss] [Development/RFC] [RFC] [uTVM] Embedded C Runtime Interface

Posted by Andrew Reusch via Apache TVM Discuss <no...@discuss.tvm.ai>.


cc @MJKlaiber 

@Mousius thanks for splitting this off into another RFC. I agree implementing a low-overhead embedded interface is super important. A couple thoughts:

At a high level, it would be great to explicitly spell out the entire interface we expect to implement here. I think it might be useful to include an entire `main()` program (either here or perhaps linked as a branch if it's long) just to ensure we aren't leaving anything out.

### Runtime vs compile time knowledge

A key question we should tackle here is when model metadata should be available. Basically there are two scenarios:

S1. The user wants to use model metadata in the compilation flow.

S2. The user wants to write functions that make use of model metadata at runtime.

My opinion is we need to support both. So any metadata here e.g. stored in a struct should also be present in some JSON created as part of Model Library Format.

### Model Input and Output Allocation

I think it'd be great to illustrate how we expect users to allocate model inputs and outputs. This is kind of there, but it would be great to propose the thing end-to-end. In particular, I'm curious how a user should size the tensors. One such possible sketch is to generate code like:
```
typedef struct {
    uint8_t input1[1 * 32 * 32 * 3];   // dimensions are examples
    int8_t input2[10 * 5 * 5 * 3];
} tvm_model_input_t;
```
This allows users with simple memory layout requirements to just declare the struct in the correct memory address space, and fill data as needed. It also serves as documentation-as-code of the required inputs and memory. We could move the buffer sizes to be constants, too. I want to ensure users retain control of all memory allocations, but we should design the API such that the typical case is very easy to use.

### Custom-Workspace Compilation

I would take this a step further and ask if we can make the workspace size a `#define` constant such that the user could allocate the space at compile time. Or whether we expect this to live in the Model Library Format metadata as a means to access it at compile time. For example, instead of:

```
TVMSetWorkspaces(&context, malloc(TVMGetWorkspaceSize(model, 0));
TVMExecute(&my_model, inputs, outputs, context);
```

I'd like people to be able to:
```
uint8_t g_workspace[TVM_MODEL_NAME_WORKSPACE_BYTES];

void main() {
  TVMSetWorkspaces(&context, g_workspace);
}
```

Finally, is it possible that whatever context is needed to identify the workspace could optionally live in flash? This has some benefits e.g. in simple deployment scenarios when the workspace is allocated as global memory. In this case, it's not possible to overwrite it with invalid pointers, which is a class of bugs that can be hard to trace down on embedded platforms

### Context

> Paired with the model descriptor, this provides any contextual information required to run the model, such as an application driven workspace configuration:
> 
> ```
> typedef struct {
> 	void** workspace; /** Pointers to different memory to use as a workspace */
> } TVMContext;
> ```

I'd like to avoid general-purpose structs if possible, at least at this phase of the implementation. While I think it's likely some top-level glue struct will eventually be a useful entry point for developers (and something is likely going to be needed as `resource_handle`, I think there are still quite a few things related to e.g. multi-core and accelerator dispatch yet to be decided. Rather than provide a sort of "kitchen sink" struct, I'd like to encourage us to define dedicated places for each orthogonal aspect of computing the  I think it'd be great to make progress on the API in this RFC and tackle the accelerator dispatch question in a follow-on.

### Generated APIs vs function pointers

When considering how to write user-facing APIs, I think we have a couple of choices:

G1. Generate a function call table e.g. `TVMModel` and write wrapper functions around it.

G2. Generate a wrapper function with a standard interface (or perhaps a standard templated model interface).

Here, I'm not necessarily proposing to generate a wrapper function with model-specific signatures (though that has been proposed elsewhere). Instead, I am just wondering whether it's necessary to place the `entrypoint` function pointer in `TVMModel`. It seems like we may have some desire to generate model-specific C++ metadata outside of that generated by the AOT codegen, so I wonder if it's worth it to just build a small codegen dedicated to this user-facing API now. Doing this would also remove the need for "accessor" functions such as `TVMGetTVMVersionMajor`.

### Accelerator binding

If possible, I'd like to defer this to a separate RFC. I think there are lots of questions to be answered there and it'd be necessary to review a lifecycle diagram of the accelerator to do so. I think that would be better placed in a separate RFC.





---
[Visit Topic](https://discuss.tvm.apache.org/t/rfc-utvm-embedded-c-runtime-interface/9951/3) to respond.

You are receiving this because you enabled mailing list mode.

To unsubscribe from these emails, [click here](https://discuss.tvm.apache.org/email/unsubscribe/e68a4c662bb9d1abad80bfdbab5f585d4e831385104d4d71b49aeaba499e8466).