You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@tvm.apache.org by GitBox <gi...@apache.org> on 2021/09/07 23:43:54 UTC

[GitHub] [tvm-rfcs] areusch commented on a change in pull request #31: C Device API

areusch commented on a change in pull request #31:
URL: https://github.com/apache/tvm-rfcs/pull/31#discussion_r703918487



##########
File path: rfcs/0031-devices-api.md
##########
@@ -0,0 +1,363 @@
+- Feature Name: C Device API
+- Start Date: 02-08-2021
+- RFC PR: [apache/tvm-rfcs#31](https://github.com/apache/tvm-rfcs/pull/31)
+- GitHub Issue: [apache/tvm#0000](https://github.com/apache/tvm/issues/0000)
+
+
+# Summary
+[summary]: #summary
+This RFC aims to provide an API which can be used by the C runtime to abstract the variety of driver APIs for different platforms. This is specifically catering towards RTOS abstractions for embedded device drivers and aims to implement a subset of the overall Device API with supporting infrastructure to enable future expansion.
+
+# Motivation
+[motivation]: #motivation
+
+When using an accelerator, such as the [Arm&reg; Ethos&trade;-U](https://github.com/apache/tvm-rfcs/pull/11), an Embedded Real-Time Operating System (RTOS) will provide a device abstraction to access the device resource. When using these abstractions, TVM needs to understand how to interact with a device for a given platform.
+
+Taking the common example of a UART interface (imagining the accelerator is communicated to via this interface); in Zephyr, this would look similar to:
+
+```c
+#include <zephyr.h>
+#include <device.h>
+
+struct device *uart_dev = device_get_binding("USART0");
+
+char data[] = "Hello World!\r\n";
+uart_tx(uart_dev, data, sizeof(data), 100);
+```
+
+Whereas in CMSIS, this would look more similar to:
+
+```c
+ARM_DRIVER_USART* uart_dev = &Driver_USART0;
+uart_dev->Initialize(NULL);
+
+char data[] = "Hello World!\r\n";
+uart_dev->Send(data, sizeof(data)/sizeof(data[0]));
+```
+
+In this example, you can see the diversity of RTOS implementations for drivers and why it's required to provide a flexible abstraction to pass devices for micro targets.
+
+# Guide-level explanation
+[guide-level-explanation]: #guide-level-explanation
+
+## User App
+`tvm_device_<device>_t`s are implemented for each RTOS or platform required, these are included by the user who chooses as appropriate for their application. Notably, to avoid dynamic allocation, the user must provide the `tvm_device_<device>_t` struct and initialise it rather than it being created and setup for them in the API. This is augmented by named functions for each device, examples in the case of the "woofles" accelerator:
+
+```c
+typedef void* tvm_device_woofles_t; // Called by User App
+int32_t TVMDeviceWooflesInit(tvm_device_woofles_t* tvm_dev, ...); // Called by User App
+int32_t TVMDeviceWooflesActivate(tvm_device_woofles_t* tvm_dev); // Called by generated code
+int32_t TVMDeviceWooflesOpen(tvm_device_woofles_t* tvm_dev); // Called by generated code
+int32_t TVMDeviceWooflesClose(tvm_device_woofles_t* tvm_dev); // Called by generated code
+int32_t TVMDeviceWooflesDeactivate(tvm_device_woofles_t* tvm_dev); // Called by generated code
+int32_t TVMDeviceWooflesDestroy(tvm_device_woofles_t* tvm_dev); // Called by User App
+```
+
+Which is implemented as part of a User App:
+```c
+#include <tvm/runtime/device.h>
+#include <tvm/device/woofles/zephyr.h>
+
+struct device* woofles_zephyr_device = device_get_binding("WOOFLES0");
+tvm_device_woofles_t accelerator; // Opaque type for accelerator device
+TVMDeviceWooflesInit(&accelerator, woofles_zephyr_device);
+
+struct tvmgen_mynetwork_devices devices {
+    .accelerator = accelerator
+};
+
+int32_t ret = tvmgen_mynetwork_run(
+    ...,
+    &devices
+);
+
+TVMDeviceDestroy(&accelerator);
+```
+
+## Platform Structures
+Users can take a implementations from `src/runtime/crt/device` and headers from `include/runtime/crt/device` which maps to their platform device implementation. In the case of a bare metal environment, this would default to a void pointer as there's no information available.

Review comment:
       ```suggestion
   Users can take a implementations from `src/runtime/crt/device` and headers from `include/runtime/crt/device` which maps to their platform device implementation. The simplest definition of `tvm_device_<device>_t` is `void*` as no information is provided to TVM.
   ```

##########
File path: rfcs/0031-devices-api.md
##########
@@ -0,0 +1,363 @@
+- Feature Name: C Device API
+- Start Date: 02-08-2021
+- RFC PR: [apache/tvm-rfcs#31](https://github.com/apache/tvm-rfcs/pull/31)
+- GitHub Issue: [apache/tvm#0000](https://github.com/apache/tvm/issues/0000)
+
+
+# Summary
+[summary]: #summary
+This RFC aims to provide an API which can be used by the C runtime to abstract the variety of driver APIs for different platforms. This is specifically catering towards RTOS abstractions for embedded device drivers and aims to implement a subset of the overall Device API with supporting infrastructure to enable future expansion.
+
+# Motivation
+[motivation]: #motivation
+
+When using an accelerator, such as the [Arm&reg; Ethos&trade;-U](https://github.com/apache/tvm-rfcs/pull/11), an Embedded Real-Time Operating System (RTOS) will provide a device abstraction to access the device resource. When using these abstractions, TVM needs to understand how to interact with a device for a given platform.
+
+Taking the common example of a UART interface (imagining the accelerator is communicated to via this interface); in Zephyr, this would look similar to:
+
+```c
+#include <zephyr.h>
+#include <device.h>
+
+struct device *uart_dev = device_get_binding("USART0");
+
+char data[] = "Hello World!\r\n";
+uart_tx(uart_dev, data, sizeof(data), 100);
+```
+
+Whereas in CMSIS, this would look more similar to:
+
+```c
+ARM_DRIVER_USART* uart_dev = &Driver_USART0;
+uart_dev->Initialize(NULL);
+
+char data[] = "Hello World!\r\n";
+uart_dev->Send(data, sizeof(data)/sizeof(data[0]));
+```
+
+In this example, you can see the diversity of RTOS implementations for drivers and why it's required to provide a flexible abstraction to pass devices for micro targets.
+
+# Guide-level explanation
+[guide-level-explanation]: #guide-level-explanation
+
+## User App
+`tvm_device_<device>_t`s are implemented for each RTOS or platform required, these are included by the user who chooses as appropriate for their application. Notably, to avoid dynamic allocation, the user must provide the `tvm_device_<device>_t` struct and initialise it rather than it being created and setup for them in the API. This is augmented by named functions for each device, examples in the case of the "woofles" accelerator:
+
+```c
+typedef void* tvm_device_woofles_t; // Called by User App
+int32_t TVMDeviceWooflesInit(tvm_device_woofles_t* tvm_dev, ...); // Called by User App
+int32_t TVMDeviceWooflesActivate(tvm_device_woofles_t* tvm_dev); // Called by generated code
+int32_t TVMDeviceWooflesOpen(tvm_device_woofles_t* tvm_dev); // Called by generated code
+int32_t TVMDeviceWooflesClose(tvm_device_woofles_t* tvm_dev); // Called by generated code
+int32_t TVMDeviceWooflesDeactivate(tvm_device_woofles_t* tvm_dev); // Called by generated code
+int32_t TVMDeviceWooflesDestroy(tvm_device_woofles_t* tvm_dev); // Called by User App
+```
+
+Which is implemented as part of a User App:
+```c
+#include <tvm/runtime/device.h>
+#include <tvm/device/woofles/zephyr.h>
+
+struct device* woofles_zephyr_device = device_get_binding("WOOFLES0");
+tvm_device_woofles_t accelerator; // Opaque type for accelerator device
+TVMDeviceWooflesInit(&accelerator, woofles_zephyr_device);
+
+struct tvmgen_mynetwork_devices devices {
+    .accelerator = accelerator
+};
+
+int32_t ret = tvmgen_mynetwork_run(
+    ...,
+    &devices
+);
+
+TVMDeviceDestroy(&accelerator);
+```
+
+## Platform Structures
+Users can take a implementations from `src/runtime/crt/device` and headers from `include/runtime/crt/device` which maps to their platform device implementation. In the case of a bare metal environment, this would default to a void pointer as there's no information available.
+
+```c
+typedef tvm_device_woofles_t void*;
+```
+
+For RTOS implementations, a structure can be created such as this simple Zephyr wrapper (include/runtime/crt/platform/zephyr.h):
+
+```c
+#include <device.h>
+
+typedef struct {
+    struct device* dev;
+} tvm_device_woofles_t;
+```
+
+This enables the OS maximum control over the resources required and provides the opportunity to craft code in whichever way is most idiomatic for that platform, such as if an additional locking mechanism is required:
+
+```c
+#include <device.h>
+#include <kernel.h>
+
+typedef struct {
+    struct device* dev;
+    k_mutex lock;
+} tvm_device_woofles_t;
+```
+
+## Generic Device API
+The majority of the device API calls should be added to the platform-agnostic `<device>.h`:
+```c
+int32_t TVMDeviceWooflesActivate(tvm_device_woofles_t* tvm_dev); // Called by generated code
+int32_t TVMDeviceWooflesOpen(tvm_device_woofles_t* tvm_dev); // Called by generated code
+int32_t TVMDeviceWooflesClose(tvm_device_woofles_t* tvm_dev); // Called by generated code
+int32_t TVMDeviceWooflesDeactivate(tvm_device_woofles_t* tvm_dev); // Called by generated code
+```
+
+These can all be implemented using the user-opaque context `tvm_device_<device>_t*`, enabling the majority of TVM code to be portable between RTOS implementations; importantly this applies to those called within operator functions (see below). The executors are agnostic to the underlying device implementation and simply get passed the relevant device pointer which is then passed to the correct symbol.
+
+## Platform Device API
+To allow setting of platform specifics into the opaque struct, these should be defined in the platform header. Alongside the header, an additional file will provide implementations (`src/runtime/crt/device/<device>/<platform>.c`):
+```c
+int32_t TVMDeviceWooflesInit(tvm_device_t* tvm_dev, struct device* zephyr_dev) {
+    tvm_dev->device = zephyr_dev;
+}
+```
+This simple wrapper enables type checking of these functions and defining a clear translation boundary between the underlying OS implementation and TVM.
+
+# Reference-level explanation
+[reference-level-explanation]: #reference-level-explanation
+
+## Entrypoint
+The entrypoint API defined in [Embedded C Runtime Interface](https://discuss.tvm.apache.org/t/rfc-utvm-embedded-c-runtime-interface/9951) is augmented with the `devices` structure which contains implemented `tvm_device_t` `struct`s for each device used by the network:
+```
+typedef struct {
+    struct tvm_device_woofles_t* woofles
+} tvmgen_mynetwork_devices;
+```
+
+These are re-cast to `void *` when entering the AOT main function to pass it without TIR understanding the struct types.
+
+```c
+int32_t tvmgen_mynetwork_run(
+    ...,
+    struct tvmgen_mynetwork_devices* devices
+) {
+    tvmgen_mynetwork_run_model(
+        ...,
+        devices->host,

Review comment:
       shall we just pass the devices struct and use `tir.tvm_struct_get`?

##########
File path: rfcs/0031-devices-api.md
##########
@@ -0,0 +1,363 @@
+- Feature Name: C Device API
+- Start Date: 02-08-2021
+- RFC PR: [apache/tvm-rfcs#31](https://github.com/apache/tvm-rfcs/pull/31)
+- GitHub Issue: [apache/tvm#0000](https://github.com/apache/tvm/issues/0000)
+
+
+# Summary
+[summary]: #summary
+This RFC aims to provide an API which can be used by the C runtime to abstract the variety of driver APIs for different platforms. This is specifically catering towards RTOS abstractions for embedded device drivers and aims to implement a subset of the overall Device API with supporting infrastructure to enable future expansion.
+
+# Motivation
+[motivation]: #motivation
+
+When using an accelerator, such as the [Arm&reg; Ethos&trade;-U](https://github.com/apache/tvm-rfcs/pull/11), an Embedded Real-Time Operating System (RTOS) will provide a device abstraction to access the device resource. When using these abstractions, TVM needs to understand how to interact with a device for a given platform.
+
+Taking the common example of a UART interface (imagining the accelerator is communicated to via this interface); in Zephyr, this would look similar to:
+
+```c
+#include <zephyr.h>
+#include <device.h>
+
+struct device *uart_dev = device_get_binding("USART0");
+
+char data[] = "Hello World!\r\n";
+uart_tx(uart_dev, data, sizeof(data), 100);
+```
+
+Whereas in CMSIS, this would look more similar to:
+
+```c
+ARM_DRIVER_USART* uart_dev = &Driver_USART0;
+uart_dev->Initialize(NULL);
+
+char data[] = "Hello World!\r\n";
+uart_dev->Send(data, sizeof(data)/sizeof(data[0]));
+```
+
+In this example, you can see the diversity of RTOS implementations for drivers and why it's required to provide a flexible abstraction to pass devices for micro targets.
+
+# Guide-level explanation
+[guide-level-explanation]: #guide-level-explanation
+
+## User App
+`tvm_device_<device>_t`s are implemented for each RTOS or platform required, these are included by the user who chooses as appropriate for their application. Notably, to avoid dynamic allocation, the user must provide the `tvm_device_<device>_t` struct and initialise it rather than it being created and setup for them in the API. This is augmented by named functions for each device, examples in the case of the "woofles" accelerator:
+
+```c
+typedef void* tvm_device_woofles_t; // Called by User App
+int32_t TVMDeviceWooflesInit(tvm_device_woofles_t* tvm_dev, ...); // Called by User App
+int32_t TVMDeviceWooflesActivate(tvm_device_woofles_t* tvm_dev); // Called by generated code
+int32_t TVMDeviceWooflesOpen(tvm_device_woofles_t* tvm_dev); // Called by generated code
+int32_t TVMDeviceWooflesClose(tvm_device_woofles_t* tvm_dev); // Called by generated code
+int32_t TVMDeviceWooflesDeactivate(tvm_device_woofles_t* tvm_dev); // Called by generated code
+int32_t TVMDeviceWooflesDestroy(tvm_device_woofles_t* tvm_dev); // Called by User App
+```
+
+Which is implemented as part of a User App:
+```c
+#include <tvm/runtime/device.h>
+#include <tvm/device/woofles/zephyr.h>
+
+struct device* woofles_zephyr_device = device_get_binding("WOOFLES0");
+tvm_device_woofles_t accelerator; // Opaque type for accelerator device
+TVMDeviceWooflesInit(&accelerator, woofles_zephyr_device);
+
+struct tvmgen_mynetwork_devices devices {
+    .accelerator = accelerator
+};
+
+int32_t ret = tvmgen_mynetwork_run(
+    ...,
+    &devices
+);
+
+TVMDeviceDestroy(&accelerator);
+```
+
+## Platform Structures
+Users can take a implementations from `src/runtime/crt/device` and headers from `include/runtime/crt/device` which maps to their platform device implementation. In the case of a bare metal environment, this would default to a void pointer as there's no information available.
+
+```c
+typedef tvm_device_woofles_t void*;
+```
+
+For RTOS implementations, a structure can be created such as this simple Zephyr wrapper (include/runtime/crt/platform/zephyr.h):
+
+```c
+#include <device.h>
+
+typedef struct {
+    struct device* dev;
+} tvm_device_woofles_t;
+```
+
+This enables the OS maximum control over the resources required and provides the opportunity to craft code in whichever way is most idiomatic for that platform, such as if an additional locking mechanism is required:
+
+```c
+#include <device.h>
+#include <kernel.h>
+
+typedef struct {
+    struct device* dev;
+    k_mutex lock;
+} tvm_device_woofles_t;
+```
+
+## Generic Device API
+The majority of the device API calls should be added to the platform-agnostic `<device>.h`:
+```c
+int32_t TVMDeviceWooflesActivate(tvm_device_woofles_t* tvm_dev); // Called by generated code
+int32_t TVMDeviceWooflesOpen(tvm_device_woofles_t* tvm_dev); // Called by generated code
+int32_t TVMDeviceWooflesClose(tvm_device_woofles_t* tvm_dev); // Called by generated code
+int32_t TVMDeviceWooflesDeactivate(tvm_device_woofles_t* tvm_dev); // Called by generated code
+```
+
+These can all be implemented using the user-opaque context `tvm_device_<device>_t*`, enabling the majority of TVM code to be portable between RTOS implementations; importantly this applies to those called within operator functions (see below). The executors are agnostic to the underlying device implementation and simply get passed the relevant device pointer which is then passed to the correct symbol.
+
+## Platform Device API
+To allow setting of platform specifics into the opaque struct, these should be defined in the platform header. Alongside the header, an additional file will provide implementations (`src/runtime/crt/device/<device>/<platform>.c`):
+```c
+int32_t TVMDeviceWooflesInit(tvm_device_t* tvm_dev, struct device* zephyr_dev) {
+    tvm_dev->device = zephyr_dev;
+}
+```
+This simple wrapper enables type checking of these functions and defining a clear translation boundary between the underlying OS implementation and TVM.
+
+# Reference-level explanation
+[reference-level-explanation]: #reference-level-explanation
+
+## Entrypoint
+The entrypoint API defined in [Embedded C Runtime Interface](https://discuss.tvm.apache.org/t/rfc-utvm-embedded-c-runtime-interface/9951) is augmented with the `devices` structure which contains implemented `tvm_device_t` `struct`s for each device used by the network:
+```
+typedef struct {
+    struct tvm_device_woofles_t* woofles
+} tvmgen_mynetwork_devices;
+```
+
+These are re-cast to `void *` when entering the AOT main function to pass it without TIR understanding the struct types.
+
+```c
+int32_t tvmgen_mynetwork_run(
+    ...,
+    struct tvmgen_mynetwork_devices* devices
+) {
+    tvmgen_mynetwork_run_model(
+        ...,
+        devices->host,
+        devices->accelerator
+    );
+}
+```
+
+## Executor Function
+Each operator is provided with a single device object which can be abstracted and passed as the `void* resource_handle`. The main function calls into the device API to setup and teardown resources before and after each operator call.
+
+```c
+int32_t tvmgen_mynetwork_run_model(..., device0, device1) {
+    TVMDeviceWooflesActivate(device0); // Could reserve or enable certain circuitry ahead of time
+    TVMDeviceWooflesActivate(device1); // Could reserve or enable certain circuitry ahead of time
+
+    TVMDeviceWooflesOpen(device0); // Opens resource for use
+    operator(device0); // Pass resource_handle to operator
+    TVMDeviceWooflesClose(device0); // Close device use
+
+    TVMDeviceWooflesOpen(device1); // Opens resource for use
+    operator(device1); // Pass resource_handle to operator
+    TVMDeviceWooflesClose(device1); // Close device use
+
+    TVMDeviceWooflesDeactivate(device0); // Turn off the device
+    TVMDeviceWooflesDeactivate(device1); // Turn off the device
+}
+```
+
+This is a simple and likely sufficient set of hooks which can be used to manage these device transactions.
+
+## Device API Functions
+In the example of Zephyr, devices are already a first class concept so many of the functions will no-op but should synchronisation be required, an example implementation could be:
+
+```c
+#include <device.h>
+
+typedef struct {
+    struct device* dev;
+    k_mutex lock;
+} tvm_device_woofles_t;
+
+int32_t TVMDeviceWooflesInit(tvm_device_woofles_t* tvm_dev, struct device* zephyr_dev) {
+    k_mutex_init(&tvm_dev->lock);
+    tvm_dev->dev = zephyr_dev;
+}
+
+int32_t TVMDeviceWooflesActivate(tvm_device_woofles_t* tvm_dev) {}
+
+int32_t TVMDeviceWooflesOpen(tvm_device_woofles_t* tvm_dev) {
+    k_mutex_lock(&tvm_dev->lock, K_FOREVER);
+}
+
+int32_t TVMDeviceWooflesClose(tvm_device_woofles_t* tvm_dev) {
+    k_mutex_unlock(&tvm_dev->lock);
+}
+
+int32_t TVMDeviceWooflesDeactivate(tvm_device_woofles_t* tvm_dev) {}
+
+int32_t TVMDeviceWooflesDestroy(tvm_device_woofles_t* tvm_dev) {
+    tvm_dev->dev = NULL;
+}
+```
+
+Whereas for CMSIS, you can use the platform-specific function to encapsulate the API to our imaginary UART accessed accelerator:
+
+```c
+typedef struct {
+    ARM_DRIVER_USART* dev;
+} tvm_device_uart_accel_t;
+
+int32_t TVMDeviceUartAccelInit(tvm_device_uart_accel_t* tvm_dev, ARM_DRIVER_USART* uart_dev) {
+    uart_dev->Initialize(NULL);
+    tvm_dev->dev = uart_dev;
+}
+
+int32_t TVMDeviceUartAccelActivate(tvm_device_uart_accel_t* tvm_dev) {}
+int32_t TVMDeviceUartAccelOpen(tvm_device_uart_accel_t* tvm_dev) {}
+int32_t TVMDeviceUartAccelClose(tvm_device_uart_accel_t* tvm_dev) {}
+int32_t TVMDeviceUartAccelDeactivate(tvm_device_uart_accel_t* tvm_dev) {}
+
+int32_t TVMDeviceUartAccelDestroy(tvm_device_uart_accel_t* tvm_dev) {
+    tvm_dev->dev->Uninitialize();
+}
+```
+
+## Operator Usage
+Each operator would be expected to utilise one device structure and be passed that as the `resource_handle` parameter, making the assumption that each operator or variant of an operator is only bound to one device at a time. In the following example it can be seen how a accelerators interface is implemented per platform to take this void pointer and call the platform specific driver code.
+
+```c
+// Operator takes opaque resource_handle
+int32_t my_operator(..., void* resource_handle) {
+    if (TVMDeviceWooflesInvoke(resource_handle, ...ins,outs,params...) != 0) {
+        return -1;
+    }
+}
+
+// Platform implementation
+int32_t TVMDeviceWooflesInvoke(tvm_device_woofles_t* tvm_dev) {
+    struct device* zephyr_dev = tvm_dev->dev;
+    my_accelerator_invoke(
+        zephyr_dev,
+        ...ins,outs,params...
+    );
+}
+```
+
+## PrimFunc Resource Handle
+A `tir::Var` is added to `PrimFunc` in `include/tvm/tir/function.h` which enables a `PrimFunc` to track and use the `resource_handle` parameter. This will be used by both unpacked and packed APIs to pass the resource down without packing into `TVMValue`, instead as a `void *`. 

Review comment:
       it kind of seems like we could get away without doing this by tracking which Target a function is going to run on and looking up the associated device API function in AOT/graph codegen. what do you think?

##########
File path: rfcs/0031-devices-api.md
##########
@@ -0,0 +1,363 @@
+- Feature Name: C Device API
+- Start Date: 02-08-2021
+- RFC PR: [apache/tvm-rfcs#31](https://github.com/apache/tvm-rfcs/pull/31)
+- GitHub Issue: [apache/tvm#0000](https://github.com/apache/tvm/issues/0000)
+
+
+# Summary
+[summary]: #summary
+This RFC aims to provide an API which can be used by the C runtime to abstract the variety of driver APIs for different platforms. This is specifically catering towards RTOS abstractions for embedded device drivers and aims to implement a subset of the overall Device API with supporting infrastructure to enable future expansion.
+
+# Motivation
+[motivation]: #motivation
+
+When using an accelerator, such as the [Arm&reg; Ethos&trade;-U](https://github.com/apache/tvm-rfcs/pull/11), an Embedded Real-Time Operating System (RTOS) will provide a device abstraction to access the device resource. When using these abstractions, TVM needs to understand how to interact with a device for a given platform.
+
+Taking the common example of a UART interface (imagining the accelerator is communicated to via this interface); in Zephyr, this would look similar to:
+
+```c
+#include <zephyr.h>
+#include <device.h>
+
+struct device *uart_dev = device_get_binding("USART0");
+
+char data[] = "Hello World!\r\n";
+uart_tx(uart_dev, data, sizeof(data), 100);
+```
+
+Whereas in CMSIS, this would look more similar to:
+
+```c
+ARM_DRIVER_USART* uart_dev = &Driver_USART0;
+uart_dev->Initialize(NULL);
+
+char data[] = "Hello World!\r\n";
+uart_dev->Send(data, sizeof(data)/sizeof(data[0]));
+```
+
+In this example, you can see the diversity of RTOS implementations for drivers and why it's required to provide a flexible abstraction to pass devices for micro targets.
+
+# Guide-level explanation
+[guide-level-explanation]: #guide-level-explanation
+
+## User App
+`tvm_device_<device>_t`s are implemented for each RTOS or platform required, these are included by the user who chooses as appropriate for their application. Notably, to avoid dynamic allocation, the user must provide the `tvm_device_<device>_t` struct and initialise it rather than it being created and setup for them in the API. This is augmented by named functions for each device, examples in the case of the "woofles" accelerator:
+
+```c

Review comment:
       can we include my life cycle picture and annotate with where each function is called? also can we document exactly why TVM calls each function and what it's expected to do (e.g. pre- and post-conditions)?

##########
File path: rfcs/0031-devices-api.md
##########
@@ -0,0 +1,363 @@
+- Feature Name: C Device API
+- Start Date: 02-08-2021
+- RFC PR: [apache/tvm-rfcs#31](https://github.com/apache/tvm-rfcs/pull/31)
+- GitHub Issue: [apache/tvm#0000](https://github.com/apache/tvm/issues/0000)
+
+
+# Summary
+[summary]: #summary
+This RFC aims to provide an API which can be used by the C runtime to abstract the variety of driver APIs for different platforms. This is specifically catering towards RTOS abstractions for embedded device drivers and aims to implement a subset of the overall Device API with supporting infrastructure to enable future expansion.
+
+# Motivation
+[motivation]: #motivation
+
+When using an accelerator, such as the [Arm&reg; Ethos&trade;-U](https://github.com/apache/tvm-rfcs/pull/11), an Embedded Real-Time Operating System (RTOS) will provide a device abstraction to access the device resource. When using these abstractions, TVM needs to understand how to interact with a device for a given platform.
+
+Taking the common example of a UART interface (imagining the accelerator is communicated to via this interface); in Zephyr, this would look similar to:
+
+```c
+#include <zephyr.h>
+#include <device.h>
+
+struct device *uart_dev = device_get_binding("USART0");
+
+char data[] = "Hello World!\r\n";
+uart_tx(uart_dev, data, sizeof(data), 100);
+```
+
+Whereas in CMSIS, this would look more similar to:
+
+```c
+ARM_DRIVER_USART* uart_dev = &Driver_USART0;
+uart_dev->Initialize(NULL);
+
+char data[] = "Hello World!\r\n";
+uart_dev->Send(data, sizeof(data)/sizeof(data[0]));
+```
+
+In this example, you can see the diversity of RTOS implementations for drivers and why it's required to provide a flexible abstraction to pass devices for micro targets.
+
+# Guide-level explanation
+[guide-level-explanation]: #guide-level-explanation
+
+## User App
+`tvm_device_<device>_t`s are implemented for each RTOS or platform required, these are included by the user who chooses as appropriate for their application. Notably, to avoid dynamic allocation, the user must provide the `tvm_device_<device>_t` struct and initialise it rather than it being created and setup for them in the API. This is augmented by named functions for each device, examples in the case of the "woofles" accelerator:
+
+```c
+typedef void* tvm_device_woofles_t; // Called by User App
+int32_t TVMDeviceWooflesInit(tvm_device_woofles_t* tvm_dev, ...); // Called by User App
+int32_t TVMDeviceWooflesActivate(tvm_device_woofles_t* tvm_dev); // Called by generated code
+int32_t TVMDeviceWooflesOpen(tvm_device_woofles_t* tvm_dev); // Called by generated code
+int32_t TVMDeviceWooflesClose(tvm_device_woofles_t* tvm_dev); // Called by generated code
+int32_t TVMDeviceWooflesDeactivate(tvm_device_woofles_t* tvm_dev); // Called by generated code
+int32_t TVMDeviceWooflesDestroy(tvm_device_woofles_t* tvm_dev); // Called by User App
+```
+
+Which is implemented as part of a User App:
+```c
+#include <tvm/runtime/device.h>
+#include <tvm/device/woofles/zephyr.h>
+
+struct device* woofles_zephyr_device = device_get_binding("WOOFLES0");
+tvm_device_woofles_t accelerator; // Opaque type for accelerator device
+TVMDeviceWooflesInit(&accelerator, woofles_zephyr_device);
+
+struct tvmgen_mynetwork_devices devices {
+    .accelerator = accelerator
+};
+
+int32_t ret = tvmgen_mynetwork_run(
+    ...,
+    &devices
+);
+
+TVMDeviceDestroy(&accelerator);
+```
+
+## Platform Structures
+Users can take a implementations from `src/runtime/crt/device` and headers from `include/runtime/crt/device` which maps to their platform device implementation. In the case of a bare metal environment, this would default to a void pointer as there's no information available.
+
+```c
+typedef tvm_device_woofles_t void*;
+```
+
+For RTOS implementations, a structure can be created such as this simple Zephyr wrapper (include/runtime/crt/platform/zephyr.h):
+
+```c
+#include <device.h>
+
+typedef struct {
+    struct device* dev;
+} tvm_device_woofles_t;
+```
+
+This enables the OS maximum control over the resources required and provides the opportunity to craft code in whichever way is most idiomatic for that platform, such as if an additional locking mechanism is required:
+
+```c
+#include <device.h>
+#include <kernel.h>
+
+typedef struct {
+    struct device* dev;
+    k_mutex lock;
+} tvm_device_woofles_t;
+```
+
+## Generic Device API
+The majority of the device API calls should be added to the platform-agnostic `<device>.h`:
+```c
+int32_t TVMDeviceWooflesActivate(tvm_device_woofles_t* tvm_dev); // Called by generated code
+int32_t TVMDeviceWooflesOpen(tvm_device_woofles_t* tvm_dev); // Called by generated code
+int32_t TVMDeviceWooflesClose(tvm_device_woofles_t* tvm_dev); // Called by generated code
+int32_t TVMDeviceWooflesDeactivate(tvm_device_woofles_t* tvm_dev); // Called by generated code
+```
+
+These can all be implemented using the user-opaque context `tvm_device_<device>_t*`, enabling the majority of TVM code to be portable between RTOS implementations; importantly this applies to those called within operator functions (see below). The executors are agnostic to the underlying device implementation and simply get passed the relevant device pointer which is then passed to the correct symbol.
+
+## Platform Device API
+To allow setting of platform specifics into the opaque struct, these should be defined in the platform header. Alongside the header, an additional file will provide implementations (`src/runtime/crt/device/<device>/<platform>.c`):
+```c
+int32_t TVMDeviceWooflesInit(tvm_device_t* tvm_dev, struct device* zephyr_dev) {
+    tvm_dev->device = zephyr_dev;
+}
+```
+This simple wrapper enables type checking of these functions and defining a clear translation boundary between the underlying OS implementation and TVM.
+
+# Reference-level explanation
+[reference-level-explanation]: #reference-level-explanation
+
+## Entrypoint
+The entrypoint API defined in [Embedded C Runtime Interface](https://discuss.tvm.apache.org/t/rfc-utvm-embedded-c-runtime-interface/9951) is augmented with the `devices` structure which contains implemented `tvm_device_t` `struct`s for each device used by the network:

Review comment:
       ```suggestion
   The entrypoint API defined in [Embedded C Runtime Interface](https://discuss.tvm.apache.org/t/rfc-utvm-embedded-c-runtime-interface/9951) is augmented with the `tvm_mynetwork_devices` structure which contains implemented `tvm_device_t` `struct`s for each device used by the network:
   ```

##########
File path: rfcs/0031-devices-api.md
##########
@@ -0,0 +1,363 @@
+- Feature Name: C Device API
+- Start Date: 02-08-2021
+- RFC PR: [apache/tvm-rfcs#31](https://github.com/apache/tvm-rfcs/pull/31)
+- GitHub Issue: [apache/tvm#0000](https://github.com/apache/tvm/issues/0000)
+
+
+# Summary
+[summary]: #summary
+This RFC aims to provide an API which can be used by the C runtime to abstract the variety of driver APIs for different platforms. This is specifically catering towards RTOS abstractions for embedded device drivers and aims to implement a subset of the overall Device API with supporting infrastructure to enable future expansion.
+
+# Motivation
+[motivation]: #motivation
+
+When using an accelerator, such as the [Arm&reg; Ethos&trade;-U](https://github.com/apache/tvm-rfcs/pull/11), an Embedded Real-Time Operating System (RTOS) will provide a device abstraction to access the device resource. When using these abstractions, TVM needs to understand how to interact with a device for a given platform.
+
+Taking the common example of a UART interface (imagining the accelerator is communicated to via this interface); in Zephyr, this would look similar to:
+
+```c
+#include <zephyr.h>
+#include <device.h>
+
+struct device *uart_dev = device_get_binding("USART0");
+
+char data[] = "Hello World!\r\n";
+uart_tx(uart_dev, data, sizeof(data), 100);
+```
+
+Whereas in CMSIS, this would look more similar to:
+
+```c
+ARM_DRIVER_USART* uart_dev = &Driver_USART0;
+uart_dev->Initialize(NULL);
+
+char data[] = "Hello World!\r\n";
+uart_dev->Send(data, sizeof(data)/sizeof(data[0]));
+```
+
+In this example, you can see the diversity of RTOS implementations for drivers and why it's required to provide a flexible abstraction to pass devices for micro targets.
+
+# Guide-level explanation
+[guide-level-explanation]: #guide-level-explanation
+
+## User App
+`tvm_device_<device>_t`s are implemented for each RTOS or platform required, these are included by the user who chooses as appropriate for their application. Notably, to avoid dynamic allocation, the user must provide the `tvm_device_<device>_t` struct and initialise it rather than it being created and setup for them in the API. This is augmented by named functions for each device, examples in the case of the "woofles" accelerator:

Review comment:
       clearer would be "TVM presumes that the RTOS, platform, or user application defines a struct type `tvm_device_<device>_t`." Also state in the last sentence here that the user must implement a set of named functions.

##########
File path: rfcs/0031-devices-api.md
##########
@@ -0,0 +1,363 @@
+- Feature Name: C Device API
+- Start Date: 02-08-2021
+- RFC PR: [apache/tvm-rfcs#31](https://github.com/apache/tvm-rfcs/pull/31)
+- GitHub Issue: [apache/tvm#0000](https://github.com/apache/tvm/issues/0000)
+
+
+# Summary
+[summary]: #summary
+This RFC aims to provide an API which can be used by the C runtime to abstract the variety of driver APIs for different platforms. This is specifically catering towards RTOS abstractions for embedded device drivers and aims to implement a subset of the overall Device API with supporting infrastructure to enable future expansion.
+
+# Motivation
+[motivation]: #motivation
+
+When using an accelerator, such as the [Arm&reg; Ethos&trade;-U](https://github.com/apache/tvm-rfcs/pull/11), an Embedded Real-Time Operating System (RTOS) will provide a device abstraction to access the device resource. When using these abstractions, TVM needs to understand how to interact with a device for a given platform.
+
+Taking the common example of a UART interface (imagining the accelerator is communicated to via this interface); in Zephyr, this would look similar to:
+
+```c
+#include <zephyr.h>
+#include <device.h>
+
+struct device *uart_dev = device_get_binding("USART0");
+
+char data[] = "Hello World!\r\n";
+uart_tx(uart_dev, data, sizeof(data), 100);
+```
+
+Whereas in CMSIS, this would look more similar to:
+
+```c
+ARM_DRIVER_USART* uart_dev = &Driver_USART0;
+uart_dev->Initialize(NULL);
+
+char data[] = "Hello World!\r\n";
+uart_dev->Send(data, sizeof(data)/sizeof(data[0]));
+```
+
+In this example, you can see the diversity of RTOS implementations for drivers and why it's required to provide a flexible abstraction to pass devices for micro targets.
+
+# Guide-level explanation
+[guide-level-explanation]: #guide-level-explanation
+
+## User App
+`tvm_device_<device>_t`s are implemented for each RTOS or platform required, these are included by the user who chooses as appropriate for their application. Notably, to avoid dynamic allocation, the user must provide the `tvm_device_<device>_t` struct and initialise it rather than it being created and setup for them in the API. This is augmented by named functions for each device, examples in the case of the "woofles" accelerator:
+
+```c
+typedef void* tvm_device_woofles_t; // Called by User App
+int32_t TVMDeviceWooflesInit(tvm_device_woofles_t* tvm_dev, ...); // Called by User App
+int32_t TVMDeviceWooflesActivate(tvm_device_woofles_t* tvm_dev); // Called by generated code
+int32_t TVMDeviceWooflesOpen(tvm_device_woofles_t* tvm_dev); // Called by generated code
+int32_t TVMDeviceWooflesClose(tvm_device_woofles_t* tvm_dev); // Called by generated code
+int32_t TVMDeviceWooflesDeactivate(tvm_device_woofles_t* tvm_dev); // Called by generated code
+int32_t TVMDeviceWooflesDestroy(tvm_device_woofles_t* tvm_dev); // Called by User App
+```
+
+Which is implemented as part of a User App:
+```c
+#include <tvm/runtime/device.h>
+#include <tvm/device/woofles/zephyr.h>
+
+struct device* woofles_zephyr_device = device_get_binding("WOOFLES0");
+tvm_device_woofles_t accelerator; // Opaque type for accelerator device
+TVMDeviceWooflesInit(&accelerator, woofles_zephyr_device);
+
+struct tvmgen_mynetwork_devices devices {
+    .accelerator = accelerator
+};
+
+int32_t ret = tvmgen_mynetwork_run(
+    ...,
+    &devices
+);
+
+TVMDeviceDestroy(&accelerator);
+```
+
+## Platform Structures
+Users can take a implementations from `src/runtime/crt/device` and headers from `include/runtime/crt/device` which maps to their platform device implementation. In the case of a bare metal environment, this would default to a void pointer as there's no information available.
+
+```c
+typedef tvm_device_woofles_t void*;
+```
+
+For RTOS implementations, a structure can be created such as this simple Zephyr wrapper (include/runtime/crt/platform/zephyr.h):
+
+```c
+#include <device.h>
+
+typedef struct {
+    struct device* dev;
+} tvm_device_woofles_t;
+```
+
+This enables the OS maximum control over the resources required and provides the opportunity to craft code in whichever way is most idiomatic for that platform, such as if an additional locking mechanism is required:

Review comment:
       ```suggestion
   This enables the OS maximum control over the resources required, allows the user application to consolidate the memory used by the device control structures, and provides the opportunity to craft code in whichever way is most idiomatic for that platform, such as if an additional locking mechanism is required:
   ```

##########
File path: rfcs/0031-devices-api.md
##########
@@ -0,0 +1,363 @@
+- Feature Name: C Device API
+- Start Date: 02-08-2021
+- RFC PR: [apache/tvm-rfcs#31](https://github.com/apache/tvm-rfcs/pull/31)
+- GitHub Issue: [apache/tvm#0000](https://github.com/apache/tvm/issues/0000)
+
+
+# Summary
+[summary]: #summary
+This RFC aims to provide an API which can be used by the C runtime to abstract the variety of driver APIs for different platforms. This is specifically catering towards RTOS abstractions for embedded device drivers and aims to implement a subset of the overall Device API with supporting infrastructure to enable future expansion.
+
+# Motivation
+[motivation]: #motivation
+
+When using an accelerator, such as the [Arm&reg; Ethos&trade;-U](https://github.com/apache/tvm-rfcs/pull/11), an Embedded Real-Time Operating System (RTOS) will provide a device abstraction to access the device resource. When using these abstractions, TVM needs to understand how to interact with a device for a given platform.
+
+Taking the common example of a UART interface (imagining the accelerator is communicated to via this interface); in Zephyr, this would look similar to:
+
+```c
+#include <zephyr.h>
+#include <device.h>
+
+struct device *uart_dev = device_get_binding("USART0");
+
+char data[] = "Hello World!\r\n";
+uart_tx(uart_dev, data, sizeof(data), 100);
+```
+
+Whereas in CMSIS, this would look more similar to:
+
+```c
+ARM_DRIVER_USART* uart_dev = &Driver_USART0;
+uart_dev->Initialize(NULL);
+
+char data[] = "Hello World!\r\n";
+uart_dev->Send(data, sizeof(data)/sizeof(data[0]));
+```
+
+In this example, you can see the diversity of RTOS implementations for drivers and why it's required to provide a flexible abstraction to pass devices for micro targets.
+
+# Guide-level explanation
+[guide-level-explanation]: #guide-level-explanation
+
+## User App
+`tvm_device_<device>_t`s are implemented for each RTOS or platform required, these are included by the user who chooses as appropriate for their application. Notably, to avoid dynamic allocation, the user must provide the `tvm_device_<device>_t` struct and initialise it rather than it being created and setup for them in the API. This is augmented by named functions for each device, examples in the case of the "woofles" accelerator:
+
+```c
+typedef void* tvm_device_woofles_t; // Called by User App
+int32_t TVMDeviceWooflesInit(tvm_device_woofles_t* tvm_dev, ...); // Called by User App
+int32_t TVMDeviceWooflesActivate(tvm_device_woofles_t* tvm_dev); // Called by generated code
+int32_t TVMDeviceWooflesOpen(tvm_device_woofles_t* tvm_dev); // Called by generated code
+int32_t TVMDeviceWooflesClose(tvm_device_woofles_t* tvm_dev); // Called by generated code
+int32_t TVMDeviceWooflesDeactivate(tvm_device_woofles_t* tvm_dev); // Called by generated code
+int32_t TVMDeviceWooflesDestroy(tvm_device_woofles_t* tvm_dev); // Called by User App
+```
+
+Which is implemented as part of a User App:
+```c
+#include <tvm/runtime/device.h>
+#include <tvm/device/woofles/zephyr.h>
+
+struct device* woofles_zephyr_device = device_get_binding("WOOFLES0");
+tvm_device_woofles_t accelerator; // Opaque type for accelerator device
+TVMDeviceWooflesInit(&accelerator, woofles_zephyr_device);
+
+struct tvmgen_mynetwork_devices devices {
+    .accelerator = accelerator
+};
+
+int32_t ret = tvmgen_mynetwork_run(
+    ...,
+    &devices
+);
+
+TVMDeviceDestroy(&accelerator);
+```
+
+## Platform Structures
+Users can take a implementations from `src/runtime/crt/device` and headers from `include/runtime/crt/device` which maps to their platform device implementation. In the case of a bare metal environment, this would default to a void pointer as there's no information available.
+
+```c
+typedef tvm_device_woofles_t void*;
+```
+
+For RTOS implementations, a structure can be created such as this simple Zephyr wrapper (include/runtime/crt/platform/zephyr.h):
+
+```c
+#include <device.h>
+
+typedef struct {
+    struct device* dev;
+} tvm_device_woofles_t;
+```
+
+This enables the OS maximum control over the resources required and provides the opportunity to craft code in whichever way is most idiomatic for that platform, such as if an additional locking mechanism is required:
+
+```c
+#include <device.h>
+#include <kernel.h>
+
+typedef struct {
+    struct device* dev;
+    k_mutex lock;
+} tvm_device_woofles_t;
+```
+
+## Generic Device API
+The majority of the device API calls should be added to the platform-agnostic `<device>.h`:
+```c
+int32_t TVMDeviceWooflesActivate(tvm_device_woofles_t* tvm_dev); // Called by generated code
+int32_t TVMDeviceWooflesOpen(tvm_device_woofles_t* tvm_dev); // Called by generated code
+int32_t TVMDeviceWooflesClose(tvm_device_woofles_t* tvm_dev); // Called by generated code
+int32_t TVMDeviceWooflesDeactivate(tvm_device_woofles_t* tvm_dev); // Called by generated code
+```
+
+These can all be implemented using the user-opaque context `tvm_device_<device>_t*`, enabling the majority of TVM code to be portable between RTOS implementations; importantly this applies to those called within operator functions (see below). The executors are agnostic to the underlying device implementation and simply get passed the relevant device pointer which is then passed to the correct symbol.
+
+## Platform Device API
+To allow setting of platform specifics into the opaque struct, these should be defined in the platform header. Alongside the header, an additional file will provide implementations (`src/runtime/crt/device/<device>/<platform>.c`):
+```c
+int32_t TVMDeviceWooflesInit(tvm_device_t* tvm_dev, struct device* zephyr_dev) {
+    tvm_dev->device = zephyr_dev;
+}
+```
+This simple wrapper enables type checking of these functions and defining a clear translation boundary between the underlying OS implementation and TVM.
+
+# Reference-level explanation
+[reference-level-explanation]: #reference-level-explanation
+
+## Entrypoint
+The entrypoint API defined in [Embedded C Runtime Interface](https://discuss.tvm.apache.org/t/rfc-utvm-embedded-c-runtime-interface/9951) is augmented with the `devices` structure which contains implemented `tvm_device_t` `struct`s for each device used by the network:
+```
+typedef struct {
+    struct tvm_device_woofles_t* woofles
+} tvmgen_mynetwork_devices;
+```
+
+These are re-cast to `void *` when entering the AOT main function to pass it without TIR understanding the struct types.
+
+```c
+int32_t tvmgen_mynetwork_run(
+    ...,
+    struct tvmgen_mynetwork_devices* devices
+) {
+    tvmgen_mynetwork_run_model(
+        ...,
+        devices->host,
+        devices->accelerator
+    );
+}
+```
+
+## Executor Function
+Each operator is provided with a single device object which can be abstracted and passed as the `void* resource_handle`. The main function calls into the device API to setup and teardown resources before and after each operator call.
+
+```c
+int32_t tvmgen_mynetwork_run_model(..., device0, device1) {
+    TVMDeviceWooflesActivate(device0); // Could reserve or enable certain circuitry ahead of time
+    TVMDeviceWooflesActivate(device1); // Could reserve or enable certain circuitry ahead of time
+
+    TVMDeviceWooflesOpen(device0); // Opens resource for use
+    operator(device0); // Pass resource_handle to operator
+    TVMDeviceWooflesClose(device0); // Close device use
+
+    TVMDeviceWooflesOpen(device1); // Opens resource for use
+    operator(device1); // Pass resource_handle to operator
+    TVMDeviceWooflesClose(device1); // Close device use
+
+    TVMDeviceWooflesDeactivate(device0); // Turn off the device
+    TVMDeviceWooflesDeactivate(device1); // Turn off the device
+}
+```
+
+This is a simple and likely sufficient set of hooks which can be used to manage these device transactions.
+
+## Device API Functions
+In the example of Zephyr, devices are already a first class concept so many of the functions will no-op but should synchronisation be required, an example implementation could be:
+
+```c
+#include <device.h>
+
+typedef struct {
+    struct device* dev;
+    k_mutex lock;
+} tvm_device_woofles_t;
+
+int32_t TVMDeviceWooflesInit(tvm_device_woofles_t* tvm_dev, struct device* zephyr_dev) {
+    k_mutex_init(&tvm_dev->lock);
+    tvm_dev->dev = zephyr_dev;
+}
+
+int32_t TVMDeviceWooflesActivate(tvm_device_woofles_t* tvm_dev) {}
+
+int32_t TVMDeviceWooflesOpen(tvm_device_woofles_t* tvm_dev) {
+    k_mutex_lock(&tvm_dev->lock, K_FOREVER);
+}
+
+int32_t TVMDeviceWooflesClose(tvm_device_woofles_t* tvm_dev) {
+    k_mutex_unlock(&tvm_dev->lock);
+}
+
+int32_t TVMDeviceWooflesDeactivate(tvm_device_woofles_t* tvm_dev) {}
+
+int32_t TVMDeviceWooflesDestroy(tvm_device_woofles_t* tvm_dev) {
+    tvm_dev->dev = NULL;
+}
+```
+
+Whereas for CMSIS, you can use the platform-specific function to encapsulate the API to our imaginary UART accessed accelerator:
+
+```c
+typedef struct {
+    ARM_DRIVER_USART* dev;
+} tvm_device_uart_accel_t;
+
+int32_t TVMDeviceUartAccelInit(tvm_device_uart_accel_t* tvm_dev, ARM_DRIVER_USART* uart_dev) {
+    uart_dev->Initialize(NULL);
+    tvm_dev->dev = uart_dev;
+}
+
+int32_t TVMDeviceUartAccelActivate(tvm_device_uart_accel_t* tvm_dev) {}
+int32_t TVMDeviceUartAccelOpen(tvm_device_uart_accel_t* tvm_dev) {}
+int32_t TVMDeviceUartAccelClose(tvm_device_uart_accel_t* tvm_dev) {}
+int32_t TVMDeviceUartAccelDeactivate(tvm_device_uart_accel_t* tvm_dev) {}
+
+int32_t TVMDeviceUartAccelDestroy(tvm_device_uart_accel_t* tvm_dev) {
+    tvm_dev->dev->Uninitialize();
+}
+```
+
+## Operator Usage
+Each operator would be expected to utilise one device structure and be passed that as the `resource_handle` parameter, making the assumption that each operator or variant of an operator is only bound to one device at a time. In the following example it can be seen how a accelerators interface is implemented per platform to take this void pointer and call the platform specific driver code.

Review comment:
       could you specify how TVM knows which device structure an operator uses? also, can you qualify this is just for the C runtime and c++ runtime when BYOC targets are linked in?

##########
File path: rfcs/0031-devices-api.md
##########
@@ -0,0 +1,363 @@
+- Feature Name: C Device API
+- Start Date: 02-08-2021
+- RFC PR: [apache/tvm-rfcs#31](https://github.com/apache/tvm-rfcs/pull/31)
+- GitHub Issue: [apache/tvm#0000](https://github.com/apache/tvm/issues/0000)
+
+
+# Summary
+[summary]: #summary
+This RFC aims to provide an API which can be used by the C runtime to abstract the variety of driver APIs for different platforms. This is specifically catering towards RTOS abstractions for embedded device drivers and aims to implement a subset of the overall Device API with supporting infrastructure to enable future expansion.
+
+# Motivation
+[motivation]: #motivation
+
+When using an accelerator, such as the [Arm&reg; Ethos&trade;-U](https://github.com/apache/tvm-rfcs/pull/11), an Embedded Real-Time Operating System (RTOS) will provide a device abstraction to access the device resource. When using these abstractions, TVM needs to understand how to interact with a device for a given platform.
+
+Taking the common example of a UART interface (imagining the accelerator is communicated to via this interface); in Zephyr, this would look similar to:
+
+```c
+#include <zephyr.h>
+#include <device.h>
+
+struct device *uart_dev = device_get_binding("USART0");
+
+char data[] = "Hello World!\r\n";
+uart_tx(uart_dev, data, sizeof(data), 100);
+```
+
+Whereas in CMSIS, this would look more similar to:
+
+```c
+ARM_DRIVER_USART* uart_dev = &Driver_USART0;
+uart_dev->Initialize(NULL);
+
+char data[] = "Hello World!\r\n";
+uart_dev->Send(data, sizeof(data)/sizeof(data[0]));
+```
+
+In this example, you can see the diversity of RTOS implementations for drivers and why it's required to provide a flexible abstraction to pass devices for micro targets.
+
+# Guide-level explanation
+[guide-level-explanation]: #guide-level-explanation
+
+## User App
+`tvm_device_<device>_t`s are implemented for each RTOS or platform required, these are included by the user who chooses as appropriate for their application. Notably, to avoid dynamic allocation, the user must provide the `tvm_device_<device>_t` struct and initialise it rather than it being created and setup for them in the API. This is augmented by named functions for each device, examples in the case of the "woofles" accelerator:
+
+```c
+typedef void* tvm_device_woofles_t; // Called by User App
+int32_t TVMDeviceWooflesInit(tvm_device_woofles_t* tvm_dev, ...); // Called by User App
+int32_t TVMDeviceWooflesActivate(tvm_device_woofles_t* tvm_dev); // Called by generated code
+int32_t TVMDeviceWooflesOpen(tvm_device_woofles_t* tvm_dev); // Called by generated code
+int32_t TVMDeviceWooflesClose(tvm_device_woofles_t* tvm_dev); // Called by generated code
+int32_t TVMDeviceWooflesDeactivate(tvm_device_woofles_t* tvm_dev); // Called by generated code
+int32_t TVMDeviceWooflesDestroy(tvm_device_woofles_t* tvm_dev); // Called by User App
+```
+
+Which is implemented as part of a User App:
+```c
+#include <tvm/runtime/device.h>
+#include <tvm/device/woofles/zephyr.h>
+
+struct device* woofles_zephyr_device = device_get_binding("WOOFLES0");
+tvm_device_woofles_t accelerator; // Opaque type for accelerator device
+TVMDeviceWooflesInit(&accelerator, woofles_zephyr_device);
+
+struct tvmgen_mynetwork_devices devices {
+    .accelerator = accelerator
+};
+
+int32_t ret = tvmgen_mynetwork_run(
+    ...,
+    &devices
+);
+
+TVMDeviceDestroy(&accelerator);
+```
+
+## Platform Structures
+Users can take a implementations from `src/runtime/crt/device` and headers from `include/runtime/crt/device` which maps to their platform device implementation. In the case of a bare metal environment, this would default to a void pointer as there's no information available.
+
+```c
+typedef tvm_device_woofles_t void*;
+```
+
+For RTOS implementations, a structure can be created such as this simple Zephyr wrapper (include/runtime/crt/platform/zephyr.h):
+
+```c
+#include <device.h>
+
+typedef struct {
+    struct device* dev;
+} tvm_device_woofles_t;
+```
+
+This enables the OS maximum control over the resources required and provides the opportunity to craft code in whichever way is most idiomatic for that platform, such as if an additional locking mechanism is required:
+
+```c
+#include <device.h>
+#include <kernel.h>
+
+typedef struct {
+    struct device* dev;
+    k_mutex lock;
+} tvm_device_woofles_t;
+```
+
+## Generic Device API
+The majority of the device API calls should be added to the platform-agnostic `<device>.h`:
+```c
+int32_t TVMDeviceWooflesActivate(tvm_device_woofles_t* tvm_dev); // Called by generated code
+int32_t TVMDeviceWooflesOpen(tvm_device_woofles_t* tvm_dev); // Called by generated code
+int32_t TVMDeviceWooflesClose(tvm_device_woofles_t* tvm_dev); // Called by generated code
+int32_t TVMDeviceWooflesDeactivate(tvm_device_woofles_t* tvm_dev); // Called by generated code
+```
+
+These can all be implemented using the user-opaque context `tvm_device_<device>_t*`, enabling the majority of TVM code to be portable between RTOS implementations; importantly this applies to those called within operator functions (see below). The executors are agnostic to the underlying device implementation and simply get passed the relevant device pointer which is then passed to the correct symbol.
+
+## Platform Device API
+To allow setting of platform specifics into the opaque struct, these should be defined in the platform header. Alongside the header, an additional file will provide implementations (`src/runtime/crt/device/<device>/<platform>.c`):
+```c
+int32_t TVMDeviceWooflesInit(tvm_device_t* tvm_dev, struct device* zephyr_dev) {
+    tvm_dev->device = zephyr_dev;
+}
+```
+This simple wrapper enables type checking of these functions and defining a clear translation boundary between the underlying OS implementation and TVM.
+
+# Reference-level explanation
+[reference-level-explanation]: #reference-level-explanation
+
+## Entrypoint
+The entrypoint API defined in [Embedded C Runtime Interface](https://discuss.tvm.apache.org/t/rfc-utvm-embedded-c-runtime-interface/9951) is augmented with the `devices` structure which contains implemented `tvm_device_t` `struct`s for each device used by the network:
+```
+typedef struct {
+    struct tvm_device_woofles_t* woofles
+} tvmgen_mynetwork_devices;
+```
+
+These are re-cast to `void *` when entering the AOT main function to pass it without TIR understanding the struct types.
+
+```c
+int32_t tvmgen_mynetwork_run(
+    ...,
+    struct tvmgen_mynetwork_devices* devices
+) {
+    tvmgen_mynetwork_run_model(
+        ...,
+        devices->host,
+        devices->accelerator
+    );
+}
+```
+
+## Executor Function
+Each operator is provided with a single device object which can be abstracted and passed as the `void* resource_handle`. The main function calls into the device API to setup and teardown resources before and after each operator call.
+
+```c
+int32_t tvmgen_mynetwork_run_model(..., device0, device1) {
+    TVMDeviceWooflesActivate(device0); // Could reserve or enable certain circuitry ahead of time

Review comment:
       would be great to say something about the memory and perhaps include a note that memory copies into the device API-managed memory may occur before TVMDeviceWooflesOpen




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@tvm.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org