You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@tvm.apache.org by Lianmin Zheng via Apache TVM Discuss <no...@discuss.tvm.ai> on 2020/11/21 02:42:45 UTC

[Apache TVM Discuss] [Development/RFC] [RFC] Building a new reproducible benchmark for TVM


## Motivation
Currently, TVM lacks an up-to-date and reproducible benchmark. The only benchmark is hosted at [tvm/apps/benchmark](https://github.com/apache/incubator-tvm/tree/main/apps/benchmark). However, this benchmark is too old and has several flaws.
1. The results were obtained 2 years ago.
2. The deep learning models are old. It does not include new models (e.g., BERT, EfficientNet)
3. The input format is TVM's internal relay format. It does not use formats from high-level frameworks (e.g., pytorch, mxnet) or open exchange format (e.g., ONNX).
4. It does not cover Intel CPUs.
5. It only provides pre-tuned configurations by [tophub](https://github.com/tlc-pack/tophub), but does not provide the scripts to generate these configurations.

This RFC aims at building a new open, reproducible bechmark for TVM. When the new benchmark is ready, we can run evaluation nightly and run auto-tuning weekly or monthly.

## Approach
As the first step, we target three models, three hardware platforms and four code generation strategies.
To make the comparision with other frameworks easier, we choose ONNX as the input model format.

- models: resnet-50, mobilenet v2 and BERT with batch size 1
- hardware platforms: NVIDIA GPU, Intel CPU, ARM CPU 
- code generation strategies: autotvm, auto-scheduler, tvm + manual library, ONNX-runtime.

All logs generated during the auto-tuning should be uploaded for future references. 

I created one a [tlc-bench](https://github.com/tlc-pack/tlc-bench) repo and opened a [roadmap](https://github.com/tlc-pack/tlc-bench/issues/1#roadmap). I am seeking for contributors who are interested in helping me.





---
[Visit Topic](https://discuss.tvm.apache.org/t/rfc-building-a-new-reproducible-benchmark-for-tvm/8496/1) to respond.

You are receiving this because you enabled mailing list mode.

To unsubscribe from these emails, [click here](https://discuss.tvm.apache.org/email/unsubscribe/4517a303480de9ef009e887e58c551b59e87636fbca7969877b17fd4f32118c3).

[Apache TVM Discuss] [Development/RFC] [RFC] Building a new reproducible benchmark for TVM

Posted by Yao Wang via Apache TVM Discuss <no...@discuss.tvm.ai>.

Yeah. A performance regression test would be very nice. There are a lot of times we need to do binary search to find the commit causing regression.





---
[Visit Topic](https://discuss.tvm.apache.org/t/rfc-building-a-new-reproducible-benchmark-for-tvm/8496/3) to respond.

You are receiving this because you enabled mailing list mode.

To unsubscribe from these emails, [click here](https://discuss.tvm.apache.org/email/unsubscribe/2374e7b311350f57c9a40ec0305bc00ccbf95af623223e4d6525f1950e8103ed).

[Apache TVM Discuss] [Development/RFC] [RFC] Building a new reproducible benchmark for TVM

Posted by Ziheng Jiang via Apache TVM Discuss <no...@discuss.tvm.ai>.

Great suggestion!

Can we make it as a nightly/weekly regression test utils and also consider adding accuracy evaluation for quantization model into this loop?





---
[Visit Topic](https://discuss.tvm.apache.org/t/rfc-building-a-new-reproducible-benchmark-for-tvm/8496/2) to respond.

You are receiving this because you enabled mailing list mode.

To unsubscribe from these emails, [click here](https://discuss.tvm.apache.org/email/unsubscribe/050d458fd51ee5aa858f51a83426242671521864cc70cf21b1f798a96a0c8055).

[Apache TVM Discuss] [Development/RFC] [RFC] Building a new reproducible benchmark for TVM

Posted by tqchen via Apache TVM Discuss <no...@discuss.tvm.ai>.

It would also be great to consider output https://tvm.apache.org/docs/dev/benchmark.html and iterate on a common log format





---
[Visit Topic](https://discuss.tvm.apache.org/t/rfc-building-a-new-reproducible-benchmark-for-tvm/8496/5) to respond.

You are receiving this because you enabled mailing list mode.

To unsubscribe from these emails, [click here](https://discuss.tvm.apache.org/email/unsubscribe/4b4389061bec8e3a4092aadd35a1711769a54e60a0ac24213cd347c0553a97dd).

[Apache TVM Discuss] [Development/RFC] [RFC] Building a new reproducible benchmark for TVM

Posted by "Cody H. Yu via Apache TVM Discuss" <no...@discuss.tvm.ai>.

Glad to see this is being planned! I could help on this as much as I can.

One question/suggestion is that if we are going to have such formal benchmarking approach, maybe we can make it MLPref friendly so that everyone can use this TVM utility to run these models on the target platform and submit the results to MLPref.





---
[Visit Topic](https://discuss.tvm.apache.org/t/rfc-building-a-new-reproducible-benchmark-for-tvm/8496/4) to respond.

You are receiving this because you enabled mailing list mode.

To unsubscribe from these emails, [click here](https://discuss.tvm.apache.org/email/unsubscribe/baa10c86a6d4e24b3b4fcb457dba9fcf02e736c4666436ae66935c1062207d53).

[Apache TVM Discuss] [Development/RFC] [RFC] Building a new reproducible benchmark for TVM

Posted by Zhi via Apache TVM Discuss <no...@discuss.tvm.ai>.

It is really nice to add the regression tests against a selected set of models, since the down streams users usually have to spend quite amount of time to find the root cause once there is a regression. Or they have to sync the upstream codebase as frequent as possible and test regression locally.

cc @jroesch, you may have some comments about the output format or the UX of the test infra.





---
[Visit Topic](https://discuss.tvm.apache.org/t/rfc-building-a-new-reproducible-benchmark-for-tvm/8496/6) to respond.

You are receiving this because you enabled mailing list mode.

To unsubscribe from these emails, [click here](https://discuss.tvm.apache.org/email/unsubscribe/d9e2f4ed854594c4fba8c0dd3c906a06e136e1a5863fa2a416184349b9f66e50).

[Apache TVM Discuss] [Development/RFC] [RFC] Building a new reproducible benchmark for TVM

Posted by Zhao Wu via Apache TVM Discuss <no...@discuss.tvm.ai>.

One question for the performance regression, how to judge the normal fluctuation, especially  CPU? Like resnet50 maybe 20.00ms, but becomes 20.88ms after one pr?





---
[Visit Topic](https://discuss.tvm.apache.org/t/rfc-building-a-new-reproducible-benchmark-for-tvm/8496/7) to respond.

You are receiving this because you enabled mailing list mode.

To unsubscribe from these emails, [click here](https://discuss.tvm.apache.org/email/unsubscribe/89e901316f8e75b07e8a01c296b56e445eabe77c8e950d3bfce00fac7b762c47).