You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@tvm.apache.org by Andrew Tulloch <no...@github.com> on 2019/07/18 00:24:23 UTC

[dmlc/tvm] [RFC] [Contrib] Minimal runtime (~12kb .text on ARMv7/x86) for subset of TVM models (#3567)

## Summary

This is an alternative implementation of a subset of the TVM runtime API (and
graph runtime) that focuses entirely on reducing code size, at the expense of
functionality (no tvm.extern(..) calls via PackedFunc, CPU only, etc). It might
be worth incrementally expanding the surface area if there's interest.

## Motivation

The motivation for this work was seeing what the minimal useful subset of the
TVM runtime is. This is relevant for e.g. super code-size constrained
applications in e.g. embedded/mobile. The current runtime is more like O(100KiB)
or so, so this might be compelling for some users.

The smaller surface area for auditing might make this relevant for
https://github.com/dmlc/tvm/issues/3159, or the usecases I was thinking about in
https://github.com/dmlc/tvm/issues/2523#issuecomment-459165815 re: the Rust
runtime.

## Analysis

The symbols in the tvm::minimalruntime space (i.e. excluding std:: and
picojson::) are about 5KiB, so I think there's a bunch of room here (i.e. we
could replace picojson:: with [`jsmn`](https://zserge.com/jsmn.html) or
something, and we could replace more of the `std::unordered_map` usage, etc with
custom primitives as well (similar to the `DynArray`).
You can view, comment on, or merge this pull request online at:

  https://github.com/dmlc/tvm/pull/3567

-- Commit Summary --

  * [RFC] [Contrib] Minimal runtime (~12kb .text on ARMv7/x86) for subset of TVM models

-- File Changes --

    M CMakeLists.txt (2)
    A cmake/modules/contrib/MinimalRuntime.cmake (23)
    A include/tvm/contrib/minimalruntime.h (42)
    A src/contrib/minimalruntime/README.md (22)
    A src/contrib/minimalruntime/minimalgraphruntime.cc (399)
    A src/contrib/minimalruntime/minimalgraphruntime.h (131)
    A src/contrib/minimalruntime/minimalruntime.cc (53)
    A src/contrib/minimalruntime/minimalruntime_api.cc (53)
    A src/contrib/minimalruntime/minimalruntime_api.h (45)
    A src/contrib/minimalruntime/minimalvector.h (101)
    A src/contrib/minimalruntime/picojson.h (1204)
    A tests/cpp/contrib_minimalruntime_test.cc (135)

-- Patch Links --

https://github.com/dmlc/tvm/pull/3567.patch
https://github.com/dmlc/tvm/pull/3567.diff

-- 
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub:
https://github.com/dmlc/tvm/pull/3567

Re: [dmlc/tvm] [RFC] [Contrib] [Runtime] Minimal runtime (~12kb .text on ARMv7/x86) for subset of TVM models (#3567)

Posted by Tianqi Chen <no...@github.com>.

Most micro controllers do have stacks(heaps) and we just need to pre-define a section in the memory space, and implement a arena style allocator (always allocate without de-allocation) and at an RAII point recycles all memory

-- 
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub:
https://github.com/dmlc/tvm/pull/3567#issuecomment-512625976

Re: [dmlc/tvm] [RFC] [Contrib] [Runtime] Minimal runtime (~12kb .text on ARMv7/x86) for subset of TVM models (#3567)

Posted by Thierry Moreau <no...@github.com>.

+ 1 on getting this integrated with uTVM. @weberlo, care to take a look at this PR and make some high-level comments?

-- 
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub:
https://github.com/dmlc/tvm/pull/3567#issuecomment-512684531

Re: [dmlc/tvm] [RFC] [Contrib] [Runtime] Minimal runtime (~12kb .text on ARMv7/x86) for subset of TVM models (#3567)

Posted by Marcus Shawcroft <no...@github.com>.

> @mshawcroft @weberlo @tmoreau89 please help to review if you have time and https://docs.tvm.ai/contribute/code_review.html#approve-and-request-changes-explicitly

So I've not had the time to study the code in detail, sorry, I would like to, but it won;t happen this week. Skimming the code does raise one immediate question:

Are we sure the memory management policy implemented in the module does not lead to fragmentation?


-- 
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub:
https://github.com/dmlc/tvm/pull/3567#issuecomment-514129051

Re: [dmlc/tvm] [RFC] [Contrib] [Runtime] Minimal runtime (~12kb .text on ARMv7/x86) for subset of TVM models (#3567)

Posted by Logan Weber <no...@github.com>.

@ajtulloch Awesome work on this!  We'll need a runtime for uTVM when we want to try self-hosted models, so the timing on this is great.

My general understanding is that it's much more common for bare-metal devices to support C, so it'd be interesting to see if we could incrementally whittle this down to pure C, like @mshawcroft said.  Even if not, this would be a nice bonus for users targeting devices that *do* have C++ support.

If we want to merge this into the µTVM namespace, `src/runtime/micro/standalone` seems fine.  But since this is code that would be loaded onto the _device_, we could also put it in `src/runtime/micro/device/standalone`.  Then we could move the current runtime in `device` into its own subfolder `device/host_driven` (or we could name it something else).

-- 
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub:
https://github.com/dmlc/tvm/pull/3567#issuecomment-513428290

Re: [dmlc/tvm] [RFC] [Contrib] [Runtime] Minimal runtime (~12kb .text on ARMv7/x86) for subset of TVM models (#3567)

Posted by Andrew Tulloch <no...@github.com>.

> This looks great. As mentioned above it potentially fits well with uTVM. For use with uTVM it would be useful to have this runtime or a derivative built in C rather than C++ in order to be deployable to the various embedded environments out there that don't have C++ runtime / tooling support.

@mshawcroft oh interesting - yeah, I started off with a pure C API (https://github.com/dmlc/tvm/pull/3567/files#diff-cf8621d821243d3ba906f0d9154abcea), but internally it's implemented with C++ (although it's deliberately designed to be compiled with -fno-rtti, -fno-exceptions, etc) - is the constraint that any use of C++ makes this unsuitable for embedded environments? 

-- 
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub:
https://github.com/dmlc/tvm/pull/3567#issuecomment-512922632

Re: [dmlc/tvm] [RFC] [Contrib] [Runtime] Minimal runtime (~12kb .text on ARMv7/x86) for subset of TVM models (#3567)

Posted by Marcus Shawcroft <no...@github.com>.

@ajtulloch the situation is not black and white, at one end of the scale is pure 'C' at the other end of the scale is 'C++' using the standard c++ libraries and all the language bells and whistles, in the middle is a bunch of intermediate restricted subsets of c++ with arbitrary subsets of the  c++ std library.  The broadest reach lowest friction to potential users is at the C end of the scale.  Aside from the language subset used, other issues are availability (and size!) of the std c++ library on a platform. The memory management strategy used (at the small end, memory fragmentation kills you, hence arbitrary use of the heap is undesirable).   By way of example, last time I checked on zephyr rtos their C++ application support capability was broadly:  no use of new / delete, no rtti, no exceptions, no static global object destruction.... (not that new/delete ban has a significant impact on the std c++ library available!) Other RTOS environments are richer, others are more constrained.

There is a limited cost to the tvm community to provide a 'C' runtime rather than a C++ runtime, but doing so broadens tvm's reach.

BTW.... Im really excited so see all the current activity in the uTVM, small runtime, embedded space.... ;-)

-- 
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub:
https://github.com/dmlc/tvm/pull/3567#issuecomment-512987196

Re: [dmlc/tvm] [RFC] [Contrib] [Runtime] Minimal runtime (~12kb .text on ARMv7/x86) for subset of TVM models (#3567)

Posted by Andrew Tulloch <no...@github.com>.

@tqchen does this look good to you?

-- 
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub:
https://github.com/dmlc/tvm/pull/3567#issuecomment-514055095

Re: [dmlc/tvm] [RFC] [Contrib] [Runtime] Minimal runtime (~12kb .text on ARMv7/x86) for subset of TVM models (#3567)

Posted by Andrew Tulloch <no...@github.com>.

Will do today @tqchen, my bad.

-- 
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub:
https://github.com/dmlc/tvm/pull/3567#issuecomment-515897387

Re: [dmlc/tvm] [RFC] [Contrib] [Runtime] Minimal runtime (~12kb .text on ARMv7/x86) for subset of TVM models (#3567)

Posted by Andrew Tulloch <no...@github.com>.

OK, so changes planned are:

- Move this to `src/runtime/micro/standalone`
- Rename flag from `MINIMAL_RUNTIME` to `MICRO_STANDALONE_RUNTIME`
- Fix CI


Will work on it right now, thank you folks.

-- 
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub:
https://github.com/dmlc/tvm/pull/3567#issuecomment-513964347

Re: [dmlc/tvm] [RFC] [Contrib] [Runtime] Minimal runtime (~12kb .text on ARMv7/x86) for subset of TVM models (#3567)

Posted by Andrew Tulloch <no...@github.com>.

@tqchen sure, will do on the weekend.

-- 
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub:
https://github.com/dmlc/tvm/pull/3567#issuecomment-517901933

Re: [dmlc/tvm] [RFC] [Contrib] [Runtime] Minimal runtime (~12kb .text on ARMv7/x86) for subset of TVM models (#3567)

Posted by Andrew Tulloch <no...@github.com>.

@weberlo eg CPU CNNs like mobilenet, resnet, etc. One thing not supported is eg tvm.extern since we don’t support packed funcs. 

-- 
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub:
https://github.com/dmlc/tvm/pull/3567#issuecomment-514596003

Re: [dmlc/tvm] [RFC] [Contrib] [Runtime] Minimal runtime (~12kb .text on ARMv7/x86) for subset of TVM models (#3567)

Posted by Tianqi Chen <no...@github.com>.

Given the current discussions, perhaps we can decide on the naming, do a few improvement if you feel you can push some of them in a few days. Then we merge it in. 

In terms of naming and code location, given the relation to uTVM. We could think about a good name for the minimal runtime. One example ("src/runtime/micro/standalone"), perhaps @mshawcroft @ajtulloch @weberlo has better ideas

-- 
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub:
https://github.com/dmlc/tvm/pull/3567#issuecomment-513086660

Re: [dmlc/tvm] [RFC] [Contrib] [Runtime] Minimal runtime (~12kb .text on ARMv7/x86) for subset of TVM models (#3567)

Posted by Tianqi Chen <no...@github.com>.

From what I see and also summarize the discussion:

- No new/delete, but allows use of custom allocators that does arena-like allocations.
- C++ is fine, template is fine, but maybe no stl

The arena-style allocator may be fine for most of our cases, the idea is that we always allocate and de-allocate in a bulk. This allows us to keep most of the allocation in a single user defined stack on a memory region.

```c++
void MyApp() {
  // RAII, everything allocated within the function will only get the space, 
   // de-allocate the necessary space when MyApp 
   tvm::micro::AllocatorContext ctx;
}
```

 A slight variation would be having the allocator remember the number of object it allocates so far in the current context, when we call free, it only decreases the counter, and we recycle everything when the counter goes to zero. This should work for most cases we care about(where the allocation/free pattern are like a stack).

-- 
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub:
https://github.com/dmlc/tvm/pull/3567#issuecomment-513006732

Re: [dmlc/tvm] [RFC] [Contrib] [Runtime] Minimal runtime (~12kb .text on ARMv7/x86) for subset of TVM models (#3567)

Posted by Tianqi Chen <no...@github.com>.

@ajtulloch can you act on the final comments and let us get it in:)

-- 
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub:
https://github.com/dmlc/tvm/pull/3567#issuecomment-515707991

Re: [dmlc/tvm] [RFC] [Contrib] [Runtime] Minimal runtime (~12kb .text on ARMv7/x86) for subset of TVM models (#3567)

Posted by Tianqi Chen <no...@github.com>.

Merged #3567 into master.

-- 
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub:
https://github.com/dmlc/tvm/pull/3567#event-2630318412

Re: [dmlc/tvm] [RFC] [Contrib] [Runtime] Minimal runtime (~12kb .text on ARMv7/x86) for subset of TVM models (#3567)

Posted by Andrew Tulloch <no...@github.com>.

@tqchen yes absolutely - from talking to you yesterday I hadn't thought of the uTVM application, but it certainly could be interesting. One possible improvement in that direction could be to create a mmap'able representation of the parsed graph_json, i.e. these fields of `MinimalGraphRuntime`:

```
  DynArray<Node> nodes_;
  DynArray<uint32_t> input_nodes_;
  DynArray<uint32_t> node_row_ptr_;
  DynArray<NodeEntry> outputs_;
```

which would allow us to 'allocation-free' construct the GraphRuntime (and eliminate the code-size cost of the json parser), and then the remaining allocations are the NDArray tensor allocations themselves which could be handled via a static storage plan or similar?


-- 
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub:
https://github.com/dmlc/tvm/pull/3567#issuecomment-512623009

Re: [dmlc/tvm] [RFC] [Contrib] [Runtime] Minimal runtime (~12kb .text on ARMv7/x86) for subset of TVM models (#3567)

Posted by Marcus Shawcroft <no...@github.com>.

This looks great.  As mentioned above it potentially fits well with uTVM.  For use with uTVM it would be useful to have this runtime or a derivative built in C rather than C++ in order to be deployable to the various embedded environments out there that don't have C++ runtime / tooling support. 

-- 
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub:
https://github.com/dmlc/tvm/pull/3567#issuecomment-512713925

Re: [dmlc/tvm] [RFC] [Contrib] [Runtime] Minimal runtime (~12kb .text on ARMv7/x86) for subset of TVM models (#3567)

Posted by Tianqi Chen <no...@github.com>.

ping @ajtulloch 

-- 
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub:
https://github.com/dmlc/tvm/pull/3567#issuecomment-520491694

Re: [dmlc/tvm] [RFC] [Contrib] [Runtime] Minimal runtime (~12kb .text on ARMv7/x86) for subset of TVM models (#3567)

Posted by Tianqi Chen <no...@github.com>.

Thanks @ajtulloch @weberlo @antinucleon @mshawcroft, this PR is now merged

-- 
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub:
https://github.com/dmlc/tvm/pull/3567#issuecomment-530972493

Re: [dmlc/tvm] [RFC] [Contrib] [Runtime] Minimal runtime (~12kb .text on ARMv7/x86) for subset of TVM models (#3567)

Posted by Tianqi Chen <no...@github.com>.

@mshawcroft  @weberlo @tmoreau89 please help to review if you have time and https://docs.tvm.ai/contribute/code_review.html#approve-and-request-changes-explicitly

-- 
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub:
https://github.com/dmlc/tvm/pull/3567#issuecomment-514058035

Re: [dmlc/tvm] [RFC] [Contrib] [Runtime] Minimal runtime (~12kb .text on ARMv7/x86) for subset of TVM models (#3567)

Posted by Tianqi Chen <no...@github.com>.

To make this PR actionable, @ajtulloch can you decide on the name space choices, make the changes, fix the CI and let us merge it in?

-- 
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub:
https://github.com/dmlc/tvm/pull/3567#issuecomment-513870114

Re: [dmlc/tvm] [RFC] [Contrib] [Runtime] Minimal runtime (~12kb .text on ARMv7/x86) for subset of TVM models (#3567)

Posted by Logan Weber <no...@github.com>.

@ajtulloch Which models have you been able to run on this runtime so far?

-- 
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub:
https://github.com/dmlc/tvm/pull/3567#issuecomment-514424051

Re: [dmlc/tvm] [RFC] [Contrib] [Runtime] Minimal runtime (~12kb .text on ARMv7/x86) for subset of TVM models (#3567)

Posted by Tianqi Chen <no...@github.com>.

@ajtulloch please look into the CI error and see if we can fix it.

-- 
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub:
https://github.com/dmlc/tvm/pull/3567#issuecomment-517433044

Re: [dmlc/tvm] [RFC] [Contrib] [Runtime] Minimal runtime (~12kb .text on ARMv7/x86) for subset of TVM models (#3567)

Posted by Tianqi Chen <no...@github.com>.

@antinucleon @weberlo please https://docs.tvm.ai/contribute/code_review.html#approve-and-request-changes-explicitly

To be clear, the current set of changes does not yet meet the requirement of no-std. It still depends on new/malloc, etc. Further refactor will be necessary, to make sure that the utvm standalone takes in a memory region that is pre-allocated, and only use memories from that region to allocate most of the executables.



-- 
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub:
https://github.com/dmlc/tvm/pull/3567#issuecomment-515135538

Re: [dmlc/tvm] [RFC] [Contrib] [Runtime] Minimal runtime (~12kb .text on ARMv7/x86) for subset of TVM models (#3567)

Posted by Tianqi Chen <no...@github.com>.

This is a great step toward putting tvm into more resource constrained devices. Given that we have another effort(uTVM @weberlo ) that aims to enable automatic optimizations, we still lack a minimum runtime that we can serve on the device. 

This PR seems to bring one great step toward that direction. One thing we can try to do is to consolidate it with uTVM and put it under tvm/runtime/micro namespace later. A fun challenge would be to further iterate to remove most needs on the OS(mainly alloc) so we can really run it on bare metal devices. 


-- 
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub:
https://github.com/dmlc/tvm/pull/3567#issuecomment-512619344