You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@tvm.apache.org by Andrew Reusch via Apache TVM Discuss <no...@discuss.tvm.ai> on 2020/11/13 22:16:45 UTC
[Apache TVM Discuss] [Development/RFC] [RFC] Linked Parameters for CPU Targets


In collaboration with @tqchen

See also: [PoC](https://github.com/apache/incubator-tvm/pull/6917)

## Overview

In RAM-limited deployment scenarios (i.e. µTVM), it's desirable to place as much constant data as possible in a separate binary section and use it directly from that section. To that end, this RFC proposes a way for TVM to include pre-linked parameters in generated `runtime::Module` .

Depending on the target and available codegen, the solution to this problem could be quite expansive. For example, some architectures could benefit from a specific way of encoding parameters, while others may prefer to encode parameters for consumption by specific hardware accelerators. This RFC doesn't aim to preclude future work in those directions, but in the interest of forward progress, we constrain our goal to simply removing the need for GraphRuntime to allocate RAM space for parameter tensors used by the `tvm.cpu()` contexts. Only the `c` and `llvm` codegens are considered here. At the end, some future directions are discussed.

## Challenges

There are several challenges to be solved here:

C1. Indicating to the Relay compiler that the user wants to enable this feature

C2. Passing the set of parameters from GraphRuntimeCodegen to the target-specific codegen.

C3. Loading linked parameters at runtime

We start from the end and work backwards.

### C3. Loading Linked Parameters at runtime

Parameters can be stored either separately or as a single binary blob. Following are some storage schemes considered:

S1. The `data` field of each parameter's `DLTensor` is stored as a symbol named `__tvm_param__pN` , `pN` corresponds to the parameter's name after passing through `GraphRuntimeCodegen` .

S2. Similar to S1, but also include the `DLTensor` .

S3. Place parameters in module Metadata.

S3 is most compatible with the existing codegen, but it has these disadvantages:

* Since parameters are encoded as a single metadata blob, traditional binary size analysis (i.e. objdump, nm) will just report the size of the metadata blob instead of size per parameter.
* Parameters can't be pinned in memory or assigned to specific sections (unless the entire metadata blob fits in the desired section).
* At runtime, parameter pointers will be initially encoded as offsets into the metadata blob, requiring knowledge of the metadata layout at debug time.

S2 is the easiest to reason about logically (i.e. a `DLTensor` object is a concept that users are likely to understand). However, doing this would require encoding the `DLTensor` struct layout into each codegen, which could become hard to maintain. It's also overkill, since DLTensor metadata are stored in the JSON graph given to the GraphRuntime and also sent over RPC.

S1 provides the benefit of linked parameters without much overhead.

Schemes S1 and S2 don't specify how parameters are looked-up at runtime. We now consider this problem. At run time, `GraphRuntime` knows the string `name` and integer `storage_id` of each parameter. Either of these can be used to identify the tensor to be loaded (in some cases, `GraphRuntime` reuses `storage_id` between tensors, but it does not do this for parameters). The linked parameter load process can then be thought of as a function that accepts this identifier and returns a `DLTensor*` or `NDArray` (depending on C or C++ runtimes) whose `data` field points to the pre-loaded parameter array.

This function could be implemented in a few different ways:

F1. Each [model runtime](https://discuss.tvm.apache.org/t/discuss-module-based-model-runtime-interface/5025) could accept a standard data structure mapping `storage_id` to `void* data` .

F2. Each [model runtime](https://discuss.tvm.apache.org/t/discuss-module-based-model-runtime-interface/5025) could invoke a function in the TVM system runtime (i.e. CRT or C++ runtime) to do the same lookup as in F1.

F3. Each generated module could expose a standard function `__lookup_linked_param` .

F4. Each system runtime could load parameters given a standard data structure mapping model name and parameter string name to `void*` and then invoke `SetParam` on the [model runtime](https://discuss.tvm.apache.org/t/discuss-module-based-model-runtime-interface/5025).

F4 is difficult to implement, because the model name and parameter name lookup are more complex, more expensive, and the API to set parameters (i.e. `TVMSetModelParameters(Module* m, const char* model_name, void* param_mapping)` ) is harder for the user to invoke. It's also difficult to be made automatic, because TVM runtime has limited knowledge of when a new model-specific TVM Module is instantiated.

F2 suffers from a similar complexity problem (needing to key both on `storage_id` and `model_name` ).

F1 is simple, but the data structure is not as easy to generate as it might seem. `storage_id` is not contiguous over the set of parameters, so the best implementation is as a list of pairs. This is awkward to work with and slow. Additionally, user code would need to separately keep track of this list and provide it to the model runtime to load parameters.

F3 is the best compromise—while no data-driven map exists, it offloads the lookup speed optimization onto the compiler via a switch statement. It also provides hardware-accelerated loaders a chance to execute any initialization code needed at parameter load time, such as waiting for backgrounded DMA transfers or decompression/decryption to complete. While this RFC doesn't consider heterogeneous execution contexts, this choice doesn't preclude their use at a later time.

In summary, the `llvm` and `c` codegen will generate an additional PackedFunc `__lookup_linked_param` in the generated `runtime::Module` which accepts a unique integer `id` identifying the parameter and returns a `void*` which should populate that the DLTensor `data` member for that parameter.

### C2. Passing parameters from Model-level to Target-level Codegen

Now that the job of the codegen is clear, the next challenge is passing parameters from model-level to target-level. Because the target-level codegen needs to include a new Module function, and the C runtime cannot rely on dynamic lookup such as `dlsym` , parameters need to be included in the same module as the generated functions.

However, at present, TVM is not guaranteed to invoke a target-level codegen for every model. It's possible that trivial models (i.e. `p0 + p1` ) may be fully realized at compile time, and an empty module will be returned. This can also happen when all functions are offloaded to accelerators.

Because of this, when linked parameters are generated, `BuildRelay` emits an additional function: the `__lookup_linked_param` . At present, this function contains no TIR code—the target-specific codegen is expected to provide an implementation. However, it attaches the parameters for the given modules as an attribute `tir.linked_params` .

When the target-specific codegen sees this function and sees that linked parameters are included, it translates those parameters' data into `static const` arrays and outputs the `__lookup_linked_param` implementation. This provides one global symbol per parameter, easing the task of analyzing binary bloat.

This approach is somewhat hacky because outside of the metadata module, TVM has no approach for including model-specific constant blobs. Since we prefer to avoid the metadata module due to aforementioned linking concerns, we feel it's best to avoid defining another generic model-level blob packager until more examples appear.

### C1. Enabling Linked Parameters

Linked parameters could be enabled a number of different ways:

W1. By marking each parameter with a special attribute. Each parameter with the attribute would be linked.

W2. With a target flag `--link-params` .

W3. With an additional parameter to `relay.build` .

W4. With a PassContext option.

W1 gives the finest-grain control, but is complex because the generated parameters may differ from those passed to `[relay.build](<http://relay.build>)` due to parameter simplification. It may be worth revisiting this approach when heterogeneous execution is considered.

W2 is the simplest, but it does mean that linked parameters require different autotuning schedules. It's not clear whether this is a good or bad thing; for µTVM, parameter access time may differ when loading from flash vs RAM, so separating the autotuning schedules is actually desirable.

W3 is a fairly high-level API change for such a specific feature. It also means that, unlike W2, that parameter is not propagated to target-level codegens. Those codegens then need to rely on other ways (i.e. checking for presence of `__lookup_linked_param` TIR function) to identify a linked parameter situation.

W4 is a reasonable choice, but would not invalidate autotuning schedules and is a bit odd since at present, linked parameters are not implemented as a TIR pass. One could envision the implementation moving into a TIR pass, though, so it's up for debate.

### Future Directions

This RFC doesn't tackle a number of challenges with pre-linking parameters, such as:

* Specifying a section for parameters
* Pinning each parameter to a specific memory location
* Supporting heterogeneous execution scenarios (i.e. offloading some parameters to BYOC)

In the future, additional configuration may be needed per parameter (i.e. section specifications, specific address pinning, etc). This could be done by expanding the `LinkedParamNode` class implemented in the PoC PR. It may be desirable to instead place this as an IRModule-level attribute. In a world where some parameters are linked using external BYOC codegen, parameters could be either omitted or better marked as such using `LinkedParamNode` .





---
[Visit Topic](https://discuss.tvm.apache.org/t/rfc-linked-parameters-for-cpu-targets/8452/1) to respond.

You are receiving this because you enabled mailing list mode.

To unsubscribe from these emails, [click here](https://discuss.tvm.apache.org/email/unsubscribe/27d3e154d3c07a6cc7cc1a8b3ff9e907971da1a94b48a83e707b540e1953574e).