You are viewing a plain text version of this content. The canonical link for it is here.
Posted to discuss-archive@tvm.apache.org by Nam Nguyen Duc via Apache TVM Discuss <no...@discuss.tvm.ai> on 2021/08/27 15:08:17 UTC
[Apache TVM Discuss] [Questions] Compiling model with target="llvm"
not faster
I tried increased ```num_measure_trials=20000```, the number layers in models is 29, the number tasks is roundly 800*29=23200 and I chosen 20000 . End after that i got the same result with a model onnx running on onnxruntime
```
onnxruntime model: 0.4735 s
tvm model after tuning: 0.5078 s
```
Is there running faster !?
Here is some tail in ```total_latency.tsv```
```python
ElapsedTime(s) 17787 EstimatedLatency(ms) 471.479 Trials 10176
ElapsedTime(s) 17837 EstimatedLatency(ms) 471.479 Trials 10240
ElapsedTime(s) 18032 EstimatedLatency(ms) 471.479 Trials 10304
ElapsedTime(s) 18128 EstimatedLatency(ms) 471.479 Trials 10368
ElapsedTime(s) 18224 EstimatedLatency(ms) 471.442 Trials 10432
ElapsedTime(s) 18275 EstimatedLatency(ms) 470.814 Trials 10496
ElapsedTime(s) 18328 EstimatedLatency(ms) 469.353 Trials 10560
ElapsedTime(s) 18379 EstimatedLatency(ms) 468.845 Trials 10624
ElapsedTime(s) 18461 EstimatedLatency(ms) 468.845 Trials 10688
ElapsedTime(s) 18658 EstimatedLatency(ms) 468.845 Trials 10752
ElapsedTime(s) 18692 EstimatedLatency(ms) 467.702 Trials 10816
ElapsedTime(s) 18738 EstimatedLatency(ms) 467.702 Trials 10880
ElapsedTime(s) 18934 EstimatedLatency(ms) 467.702 Trials 10944
ElapsedTime(s) 18980 EstimatedLatency(ms) 467.702 Trials 11008
ElapsedTime(s) 19076 EstimatedLatency(ms) 467.702 Trials 11072
ElapsedTime(s) 19164 EstimatedLatency(ms) 467.702 Trials 11136
ElapsedTime(s) 19360 EstimatedLatency(ms) 467.702 Trials 11200
ElapsedTime(s) 19451 EstimatedLatency(ms) 467.702 Trials 11264
ElapsedTime(s) 19645 EstimatedLatency(ms) 467.702 Trials 11328
ElapsedTime(s) 19697 EstimatedLatency(ms) 467.702 Trials 11392
ElapsedTime(s) 19785 EstimatedLatency(ms) 467.702 Trials 11456
ElapsedTime(s) 19886 EstimatedLatency(ms) 467.702 Trials 11520
ElapsedTime(s) 19946 EstimatedLatency(ms) 467.702 Trials 11584
ElapsedTime(s) 20138 EstimatedLatency(ms) 467.702 Trials 11648
ElapsedTime(s) 20199 EstimatedLatency(ms) 467.702 Trials 11712
ElapsedTime(s) 20402 EstimatedLatency(ms) 467.702 Trials 11776
ElapsedTime(s) 20459 EstimatedLatency(ms) 466.907 Trials 11840
ElapsedTime(s) 20544 EstimatedLatency(ms) 466.907 Trials 11904
ElapsedTime(s) 20658 EstimatedLatency(ms) 466.907 Trials 11968
ElapsedTime(s) 20751 EstimatedLatency(ms) 466.907 Trials 12032
ElapsedTime(s) 20945 EstimatedLatency(ms) 466.907 Trials 12096
ElapsedTime(s) 20987 EstimatedLatency(ms) 465.989 Trials 12160
ElapsedTime(s) 21176 EstimatedLatency(ms) 465.989 Trials 12224
ElapsedTime(s) 21219 EstimatedLatency(ms) 465.989 Trials 12288
ElapsedTime(s) 21326 EstimatedLatency(ms) 465.989 Trials 12352
ElapsedTime(s) 21423 EstimatedLatency(ms) 465.989 Trials 12416
ElapsedTime(s) 21472 EstimatedLatency(ms) 465.989 Trials 12480
ElapsedTime(s) 21665 EstimatedLatency(ms) 465.989 Trials 12544
ElapsedTime(s) 21720 EstimatedLatency(ms) 465.989 Trials 12608
ElapsedTime(s) 21784 EstimatedLatency(ms) 465.747 Trials 12672
ElapsedTime(s) 21877 EstimatedLatency(ms) 465.747 Trials 12736
ElapsedTime(s) 22071 EstimatedLatency(ms) 465.747 Trials 12800
ElapsedTime(s) 22181 EstimatedLatency(ms) 465.747 Trials 12864
ElapsedTime(s) 22274 EstimatedLatency(ms) 465.488 Trials 12928
ElapsedTime(s) 22465 EstimatedLatency(ms) 464.841 Trials 12992
ElapsedTime(s) 22522 EstimatedLatency(ms) 464.773 Trials 13056
ElapsedTime(s) 22719 EstimatedLatency(ms) 464.773 Trials 13120
ElapsedTime(s) 22807 EstimatedLatency(ms) 464.773 Trials 13184
ElapsedTime(s) 23010 EstimatedLatency(ms) 464.773 Trials 13248
ElapsedTime(s) 23063 EstimatedLatency(ms) 464.773 Trials 13312
ElapsedTime(s) 23162 EstimatedLatency(ms) 464.773 Trials 13376
ElapsedTime(s) 23264 EstimatedLatency(ms) 464.773 Trials 13440
ElapsedTime(s) 23465 EstimatedLatency(ms) 464.773 Trials 13504
ElapsedTime(s) 23533 EstimatedLatency(ms) 464.773 Trials 13568
ElapsedTime(s) 23633 EstimatedLatency(ms) 464.773 Trials 13632
ElapsedTime(s) 23688 EstimatedLatency(ms) 464.773 Trials 13696
ElapsedTime(s) 23778 EstimatedLatency(ms) 464.773 Trials 13760
ElapsedTime(s) 23865 EstimatedLatency(ms) 464.773 Trials 13824
ElapsedTime(s) 24060 EstimatedLatency(ms) 464.773 Trials 13888
ElapsedTime(s) 24111 EstimatedLatency(ms) 464.773 Trials 13952
ElapsedTime(s) 24158 EstimatedLatency(ms) 464.773 Trials 14016
ElapsedTime(s) 24355 EstimatedLatency(ms) 464.773 Trials 14080
ElapsedTime(s) 24455 EstimatedLatency(ms) 464.773 Trials 14144
ElapsedTime(s) 24652 EstimatedLatency(ms) 464.773 Trials 14208
ElapsedTime(s) 24749 EstimatedLatency(ms) 464.773 Trials 14272
ElapsedTime(s) 24844 EstimatedLatency(ms) 464.773 Trials 14336
ElapsedTime(s) 24922 EstimatedLatency(ms) 464.773 Trials 14400
ElapsedTime(s) 25119 EstimatedLatency(ms) 464.773 Trials 14464
ElapsedTime(s) 25219 EstimatedLatency(ms) 464.773 Trials 14528
ElapsedTime(s) 25305 EstimatedLatency(ms) 464.773 Trials 14592
ElapsedTime(s) 25356 EstimatedLatency(ms) 464.773 Trials 14656
ElapsedTime(s) 25555 EstimatedLatency(ms) 464.773 Trials 14720
ElapsedTime(s) 25610 EstimatedLatency(ms) 464.109 Trials 14784
ElapsedTime(s) 25656 EstimatedLatency(ms) 463.423 Trials 14848
ElapsedTime(s) 25703 EstimatedLatency(ms) 463.383 Trials 14912
ElapsedTime(s) 25784 EstimatedLatency(ms) 463.383 Trials 14976
ElapsedTime(s) 25982 EstimatedLatency(ms) 463.383 Trials 15040
ElapsedTime(s) 26039 EstimatedLatency(ms) 463.383 Trials 15104
ElapsedTime(s) 26131 EstimatedLatency(ms) 463.383 Trials 15168
ElapsedTime(s) 26174 EstimatedLatency(ms) 462.095 Trials 15232
ElapsedTime(s) 26370 EstimatedLatency(ms) 461.838 Trials 15296
ElapsedTime(s) 26464 EstimatedLatency(ms) 461.647 Trials 15360
ElapsedTime(s) 26667 EstimatedLatency(ms) 461.647 Trials 15424
ElapsedTime(s) 26723 EstimatedLatency(ms) 460.820 Trials 15488
ElapsedTime(s) 26777 EstimatedLatency(ms) 460.820 Trials 15552
ElapsedTime(s) 26856 EstimatedLatency(ms) 460.820 Trials 15616
ElapsedTime(s) 26960 EstimatedLatency(ms) 460.820 Trials 15680
ElapsedTime(s) 27057 EstimatedLatency(ms) 460.820 Trials 15744
ElapsedTime(s) 27256 EstimatedLatency(ms) 460.820 Trials 15808
ElapsedTime(s) 27352 EstimatedLatency(ms) 460.820 Trials 15872
ElapsedTime(s) 27553 EstimatedLatency(ms) 460.820 Trials 15936
ElapsedTime(s) 27610 EstimatedLatency(ms) 460.739 Trials 16000
ElapsedTime(s) 27658 EstimatedLatency(ms) 460.739 Trials 16064
ElapsedTime(s) 27760 EstimatedLatency(ms) 460.739 Trials 16128
ElapsedTime(s) 27803 EstimatedLatency(ms) 460.502 Trials 16192
ElapsedTime(s) 28003 EstimatedLatency(ms) 460.123 Trials 16256
ElapsedTime(s) 28062 EstimatedLatency(ms) 459.634 Trials 16320
ElapsedTime(s) 28171 EstimatedLatency(ms) 459.381 Trials 16384
ElapsedTime(s) 28371 EstimatedLatency(ms) 457.684 Trials 16448
ElapsedTime(s) 28569 EstimatedLatency(ms) 457.684 Trials 16512
ElapsedTime(s) 28765 EstimatedLatency(ms) 457.684 Trials 16576
ElapsedTime(s) 28854 EstimatedLatency(ms) 457.684 Trials 16640
ElapsedTime(s) 29043 EstimatedLatency(ms) 457.570 Trials 16704
ElapsedTime(s) 29151 EstimatedLatency(ms) 457.570 Trials 16768
ElapsedTime(s) 29244 EstimatedLatency(ms) 457.570 Trials 16832
ElapsedTime(s) 29315 EstimatedLatency(ms) 457.570 Trials 16896
ElapsedTime(s) 29373 EstimatedLatency(ms) 457.570 Trials 16960
ElapsedTime(s) 29461 EstimatedLatency(ms) 457.570 Trials 17024
ElapsedTime(s) 29561 EstimatedLatency(ms) 457.570 Trials 17088
ElapsedTime(s) 29622 EstimatedLatency(ms) 456.816 Trials 17152
ElapsedTime(s) 29717 EstimatedLatency(ms) 456.667 Trials 17216
ElapsedTime(s) 29814 EstimatedLatency(ms) 456.667 Trials 17280
ElapsedTime(s) 29864 EstimatedLatency(ms) 456.667 Trials 17344
ElapsedTime(s) 29907 EstimatedLatency(ms) 455.467 Trials 17408
ElapsedTime(s) 29999 EstimatedLatency(ms) 455.406 Trials 17472
ElapsedTime(s) 30085 EstimatedLatency(ms) 455.406 Trials 17536
ElapsedTime(s) 30286 EstimatedLatency(ms) 455.406 Trials 17600
ElapsedTime(s) 30337 EstimatedLatency(ms) 455.406 Trials 17664
ElapsedTime(s) 30411 EstimatedLatency(ms) 455.406 Trials 17728
ElapsedTime(s) 30600 EstimatedLatency(ms) 455.406 Trials 17792
ElapsedTime(s) 30706 EstimatedLatency(ms) 455.406 Trials 17856
ElapsedTime(s) 30764 EstimatedLatency(ms) 455.406 Trials 17920
ElapsedTime(s) 30964 EstimatedLatency(ms) 455.406 Trials 17984
ElapsedTime(s) 31076 EstimatedLatency(ms) 454.799 Trials 18048
ElapsedTime(s) 31174 EstimatedLatency(ms) 454.799 Trials 18112
ElapsedTime(s) 31288 EstimatedLatency(ms) 454.799 Trials 18176
ElapsedTime(s) 31342 EstimatedLatency(ms) 454.799 Trials 18240
ElapsedTime(s) 31402 EstimatedLatency(ms) 454.502 Trials 18304
ElapsedTime(s) 31598 EstimatedLatency(ms) 454.502 Trials 18368
ElapsedTime(s) 31693 EstimatedLatency(ms) 454.435 Trials 18432
ElapsedTime(s) 31887 EstimatedLatency(ms) 454.435 Trials 18496
ElapsedTime(s) 31955 EstimatedLatency(ms) 454.435 Trials 18560
ElapsedTime(s) 32063 EstimatedLatency(ms) 454.435 Trials 18624
ElapsedTime(s) 32164 EstimatedLatency(ms) 454.435 Trials 18688
ElapsedTime(s) 32362 EstimatedLatency(ms) 453.925 Trials 18752
ElapsedTime(s) 32560 EstimatedLatency(ms) 453.769 Trials 18816
ElapsedTime(s) 32751 EstimatedLatency(ms) 453.270 Trials 18880
ElapsedTime(s) 32948 EstimatedLatency(ms) 453.270 Trials 18944
ElapsedTime(s) 33044 EstimatedLatency(ms) 453.270 Trials 19008
ElapsedTime(s) 33100 EstimatedLatency(ms) 453.270 Trials 19072
ElapsedTime(s) 33215 EstimatedLatency(ms) 453.270 Trials 19136
ElapsedTime(s) 33281 EstimatedLatency(ms) 453.086 Trials 19200
ElapsedTime(s) 33481 EstimatedLatency(ms) 453.086 Trials 19264
ElapsedTime(s) 33576 EstimatedLatency(ms) 453.086 Trials 19328
ElapsedTime(s) 33677 EstimatedLatency(ms) 452.989 Trials 19392
ElapsedTime(s) 33741 EstimatedLatency(ms) 452.989 Trials 19456
ElapsedTime(s) 33801 EstimatedLatency(ms) 452.989 Trials 19520
ElapsedTime(s) 33873 EstimatedLatency(ms) 452.989 Trials 19584
ElapsedTime(s) 33975 EstimatedLatency(ms) 452.989 Trials 19648
ElapsedTime(s) 34174 EstimatedLatency(ms) 452.989 Trials 19712
ElapsedTime(s) 34221 EstimatedLatency(ms) 452.103 Trials 19776
ElapsedTime(s) 34326 EstimatedLatency(ms) 452.103 Trials 19840
ElapsedTime(s) 34439 EstimatedLatency(ms) 452.103 Trials 19904
ElapsedTime(s) 34557 EstimatedLatency(ms) 452.103 Trials 19968
ElapsedTime(s) 34615 EstimatedLatency(ms) 452.103 Trials 20032
```
I could have ctrl-c, but I want the model improving as much as possible
<br> It took me 1 day to wait for the auto tuning process to finish and the results didn't improve much :frowning:
---
[Visit Topic](https://discuss.tvm.apache.org/t/compiling-model-with-target-llvm-not-faster/10889/6) to respond.
You are receiving this because you enabled mailing list mode.
To unsubscribe from these emails, [click here](https://discuss.tvm.apache.org/email/unsubscribe/32c4e734d96966253672903905e86123fcaa0777bff864d0cfd30ead760a6cfc).
[Apache TVM Discuss] [Questions] Compiling model with target="llvm"
not faster
Posted by Nam Nguyen Duc via Apache TVM Discuss <no...@discuss.tvm.ai>.
Yep, the first line 1856 trial `EstimatedLatency(ms)` is 1063.960 s
---
[Visit Topic](https://discuss.tvm.apache.org/t/compiling-model-with-target-llvm-not-faster/10889/10) to respond.
You are receiving this because you enabled mailing list mode.
To unsubscribe from these emails, [click here](https://discuss.tvm.apache.org/email/unsubscribe/57a26b5712bc366bca2bf14be2a80bfd483930387a02e993aab785b63d3e9391).
[Apache TVM Discuss] [Questions] Compiling model with target="llvm"
not faster
Posted by Andrey Malyshev via Apache TVM Discuss <no...@discuss.tvm.ai>.
[quote="namduc, post:8, topic:10889"]
what are the “fist lines” you mentioned here !?
[/quote]
I see in quote a part of the tuning trace starting from `ElapsedTime(s) 17787 EstimatedLatency(ms) 471.479 Trials 10176`, it refers to 10899 trial, and I referred it as "first line". while if you take a look into full file, the first line should start from 29*64=1856 trial. and perf from this 1856 to 10176 should be improved significantly
---
[Visit Topic](https://discuss.tvm.apache.org/t/compiling-model-with-target-llvm-not-faster/10889/9) to respond.
You are receiving this because you enabled mailing list mode.
To unsubscribe from these emails, [click here](https://discuss.tvm.apache.org/email/unsubscribe/5e1c834e1cd40d691ae5532371f3a50439a7ff840b6d767632c173f67c75768e).
[Apache TVM Discuss] [Questions] Compiling model with target="llvm"
not faster
Posted by Nam Nguyen Duc via Apache TVM Discuss <no...@discuss.tvm.ai>.
[quote="elvin-n, post:7, topic:10889"]
Is it the same model as in the beginning?
[/quote]
Model pure pytorch inference in: 0.7795s compare with tvm model after tuning is faster by 0.2s
[quote="elvin-n, post:7, topic:10889"]
At the same time I am pretty sure that if we take to the first lines, results should be improved significantly during tuning.
[/quote]
what are the "fist lines" you mentioned here !?
<br> I don't quite understand this sentence
---
[Visit Topic](https://discuss.tvm.apache.org/t/compiling-model-with-target-llvm-not-faster/10889/8) to respond.
You are receiving this because you enabled mailing list mode.
To unsubscribe from these emails, [click here](https://discuss.tvm.apache.org/email/unsubscribe/af43e1fb9dab6170c29a68be8c93f6470863ecab01e2d0dd30859a8bbf8fd77f).
[Apache TVM Discuss] [Questions] Compiling model with target="llvm"
not faster
Posted by Andrey Malyshev via Apache TVM Discuss <no...@discuss.tvm.ai>.
[quote="namduc, post:6, topic:10889"]
after that i got the same result with a model onnx running on onnxruntime
[/quote]
It might be that onnxruntime was able to use hardware resources the most efficient way and improving of the inference time more is possible but might be hard. And TVM get the same perfect result. It's hard to say without looking into the model. Is it publically available model? Does it have more conv layers or matmul/dense?
[quote="namduc, post:6, topic:10889"]
and the results didn’t improve much
[/quote]
Is it the same model as in the beginning? If it is the same - there is a progress comparing to different tvm results. From 3.4s to 0.5 seconds. As for quote of tsv file - I see a part from 10000 trials to 20000. Probably tuning had to be stopped on that 1000th trial or early. At the same time I am pretty sure that if we take to the first lines, results should be improved significantly during tuning.
---
[Visit Topic](https://discuss.tvm.apache.org/t/compiling-model-with-target-llvm-not-faster/10889/7) to respond.
You are receiving this because you enabled mailing list mode.
To unsubscribe from these emails, [click here](https://discuss.tvm.apache.org/email/unsubscribe/a79be5c4b3aeb6108ce1de2cce108ca83738404308220e20b2d1290d0d03c8e3).