You are viewing a plain text version of this content. The canonical link for it is here.

Posted to commits@tvm.apache.org by GitBox <gi...@apache.org> on 2022/07/21 21:38:53 UTC

[GitHub] [tvm] areusch commented on pull request #12087: [UMA] UMA v1.0

areusch commented on PR #12087:
URL: https://github.com/apache/tvm/pull/12087#issuecomment-1191960637

we discussed this in the [Community Meeting](https://discuss.tvm.apache.org/t/next-tvm-community-meeting-july-20/13148/2) yesterday. here are notes on the discussion:
- when the design references "Accelerator A" and "Accelerator B," does this mean we're using both simultaneously?
- not in this v1, though the architecture supports it. at present they can simply coexist as options.
- should we integrate this with TVMC?
- @areusch: it should be fairly easy to integrate the UMA targets with the `tvmc run` command
- @manupa-arm : this should be pretty straightforward to add to tvmc. the bigger concern here was around `uma_cli.py`, which is supposed to generate a starter implementation for new accelerators in uma.
- @areusch : we should either have tvmc or some other developer-facing entry point to house tools like this. probably not bad to add dev tools to tvmc now--we can always migrate them out if we need to.
- @MichaelJKlaiber : intention of uma_cli is just to make the tutorial easier to replicate on your own, so there are two steps there--create the accelerator flow and then run inference.
- @manupa-arm : do we expect the CLI to work when we're in an environment where only the tvm wheel is present? e.g what about the C sources included with accelerator? should those go in the wheel?
- @MichaelJKlaiber: those sources are copied into the generated dir by uma_cli.
- @areusch : what's the include path folks are expected to set on their downstream C compiler? seems like the C files included with accelerator template should really make it into the Model Library Format. Could produce another CSourceModule which would create another e.g. `default_lib3.cc` in the MLF. Could also use the `import_c` pragma [similar](https://github.com/apache/tvm/blob/main/python/tvm/topi/arm_cpu/mprofile/dsp/micro_kernel/max_pool.py#L87). to how we do for microTVM.
- where should the template live?
- @areusch : could go either way or both. how do we expect people to package their accelerator flow? if merging into mainline, perhaps we want in the python import path. if keeping accelerator flow private, perhaps apps is similar to carrying that code alongside the tvm wheel.
- @manupa-arm : deciding intended location based on whether a flow will get upstreamed makes sense. `_template` is an example rather than a target, so maybe `apps` could make more sense for it.
- @manupa-arm : also suggest to break the CLI changes into another PR.

- @MichaelJKlaiber : only vanilla accelerator was impl'd; do folks have suggestions for chocolate and strawberry? feel free ot post in discuss thread or contact
- @areusch : would be cool to see something that leverages usmp to model physical accelerator memories. could also be cool to see an example where buffers were marked to live on-device.
- Slava: are the optimization provided in the default TVM pipeline also part of the UMA pipeline?
- @areusch : you can classify the optimizations in terms of relay passes, scheduling, and post-scheduling passes. TVM tries to operate on an IRModule-to-IRModule prinicple, where each optimization or step takes an IRModule and returns an IRModule. when you mark a subgraph as offloaded to an UMA pipeline, some optimizations aren't enabled--for example, Relay-level operator fusion. Others e.g. those which operate post-scheduling (usmp, for example) will run on UMA operators.
- Slava: if I have a conv2d followed by batch norm, and only the conv2d is offloaded, then the batch norm is not fused by default?
- @areusch: the right way to do that would be to mark both as offloaded and do the fusion yourself. there are also some efforts to enable post-scheduling fusion via Relax, but those haven't landed yet.
- Slava: what's the best way to leverage UMA if e.g. we have 2 different implementations of conv2d depending on kernel size?
- @areusch : you'd need to give your pattern matcher enough fidelity to differentiate those two workloads. you can also inspect the matched subgraph after using a looser pattern.
- slava: what's the rough timeline?
- not really a timeline, but see https://github.com/apache/tvm/issues/11260
- @MichaelJKlaiber : can also discuss more questions in high-bandwidth with folks.
- suggest folks post up on the discuss forum. we can also use this meeting for further discussion.

--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@tvm.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org