You are viewing a plain text version of this content. The canonical link for it is here.
Posted to discuss-archive@tvm.apache.org by Nam Nguyen Duc via Apache TVM Discuss <no...@discuss.tvm.ai> on 2021/08/27 15:08:17 UTC

[Apache TVM Discuss] [Questions] Compiling model with target="llvm" not faster


I tried increased ```num_measure_trials=20000```, the number layers in models is 29, the number tasks is roundly 800*29=23200 and I chosen 20000 . End after that i got the same result with a model onnx running on onnxruntime 
```
onnxruntime model: 0.4735 s
tvm model after tuning: 0.5078 s
```
Is there running faster !?
Here is some tail in ```total_latency.tsv```
```python
ElapsedTime(s)	17787	EstimatedLatency(ms)	471.479	Trials	10176
ElapsedTime(s)	17837	EstimatedLatency(ms)	471.479	Trials	10240
ElapsedTime(s)	18032	EstimatedLatency(ms)	471.479	Trials	10304
ElapsedTime(s)	18128	EstimatedLatency(ms)	471.479	Trials	10368
ElapsedTime(s)	18224	EstimatedLatency(ms)	471.442	Trials	10432
ElapsedTime(s)	18275	EstimatedLatency(ms)	470.814	Trials	10496
ElapsedTime(s)	18328	EstimatedLatency(ms)	469.353	Trials	10560
ElapsedTime(s)	18379	EstimatedLatency(ms)	468.845	Trials	10624
ElapsedTime(s)	18461	EstimatedLatency(ms)	468.845	Trials	10688
ElapsedTime(s)	18658	EstimatedLatency(ms)	468.845	Trials	10752
ElapsedTime(s)	18692	EstimatedLatency(ms)	467.702	Trials	10816
ElapsedTime(s)	18738	EstimatedLatency(ms)	467.702	Trials	10880
ElapsedTime(s)	18934	EstimatedLatency(ms)	467.702	Trials	10944
ElapsedTime(s)	18980	EstimatedLatency(ms)	467.702	Trials	11008
ElapsedTime(s)	19076	EstimatedLatency(ms)	467.702	Trials	11072
ElapsedTime(s)	19164	EstimatedLatency(ms)	467.702	Trials	11136
ElapsedTime(s)	19360	EstimatedLatency(ms)	467.702	Trials	11200
ElapsedTime(s)	19451	EstimatedLatency(ms)	467.702	Trials	11264
ElapsedTime(s)	19645	EstimatedLatency(ms)	467.702	Trials	11328
ElapsedTime(s)	19697	EstimatedLatency(ms)	467.702	Trials	11392
ElapsedTime(s)	19785	EstimatedLatency(ms)	467.702	Trials	11456
ElapsedTime(s)	19886	EstimatedLatency(ms)	467.702	Trials	11520
ElapsedTime(s)	19946	EstimatedLatency(ms)	467.702	Trials	11584
ElapsedTime(s)	20138	EstimatedLatency(ms)	467.702	Trials	11648
ElapsedTime(s)	20199	EstimatedLatency(ms)	467.702	Trials	11712
ElapsedTime(s)	20402	EstimatedLatency(ms)	467.702	Trials	11776
ElapsedTime(s)	20459	EstimatedLatency(ms)	466.907	Trials	11840
ElapsedTime(s)	20544	EstimatedLatency(ms)	466.907	Trials	11904
ElapsedTime(s)	20658	EstimatedLatency(ms)	466.907	Trials	11968
ElapsedTime(s)	20751	EstimatedLatency(ms)	466.907	Trials	12032
ElapsedTime(s)	20945	EstimatedLatency(ms)	466.907	Trials	12096
ElapsedTime(s)	20987	EstimatedLatency(ms)	465.989	Trials	12160
ElapsedTime(s)	21176	EstimatedLatency(ms)	465.989	Trials	12224
ElapsedTime(s)	21219	EstimatedLatency(ms)	465.989	Trials	12288
ElapsedTime(s)	21326	EstimatedLatency(ms)	465.989	Trials	12352
ElapsedTime(s)	21423	EstimatedLatency(ms)	465.989	Trials	12416
ElapsedTime(s)	21472	EstimatedLatency(ms)	465.989	Trials	12480
ElapsedTime(s)	21665	EstimatedLatency(ms)	465.989	Trials	12544
ElapsedTime(s)	21720	EstimatedLatency(ms)	465.989	Trials	12608
ElapsedTime(s)	21784	EstimatedLatency(ms)	465.747	Trials	12672
ElapsedTime(s)	21877	EstimatedLatency(ms)	465.747	Trials	12736
ElapsedTime(s)	22071	EstimatedLatency(ms)	465.747	Trials	12800
ElapsedTime(s)	22181	EstimatedLatency(ms)	465.747	Trials	12864
ElapsedTime(s)	22274	EstimatedLatency(ms)	465.488	Trials	12928
ElapsedTime(s)	22465	EstimatedLatency(ms)	464.841	Trials	12992
ElapsedTime(s)	22522	EstimatedLatency(ms)	464.773	Trials	13056
ElapsedTime(s)	22719	EstimatedLatency(ms)	464.773	Trials	13120
ElapsedTime(s)	22807	EstimatedLatency(ms)	464.773	Trials	13184
ElapsedTime(s)	23010	EstimatedLatency(ms)	464.773	Trials	13248
ElapsedTime(s)	23063	EstimatedLatency(ms)	464.773	Trials	13312
ElapsedTime(s)	23162	EstimatedLatency(ms)	464.773	Trials	13376
ElapsedTime(s)	23264	EstimatedLatency(ms)	464.773	Trials	13440
ElapsedTime(s)	23465	EstimatedLatency(ms)	464.773	Trials	13504
ElapsedTime(s)	23533	EstimatedLatency(ms)	464.773	Trials	13568
ElapsedTime(s)	23633	EstimatedLatency(ms)	464.773	Trials	13632
ElapsedTime(s)	23688	EstimatedLatency(ms)	464.773	Trials	13696
ElapsedTime(s)	23778	EstimatedLatency(ms)	464.773	Trials	13760
ElapsedTime(s)	23865	EstimatedLatency(ms)	464.773	Trials	13824
ElapsedTime(s)	24060	EstimatedLatency(ms)	464.773	Trials	13888
ElapsedTime(s)	24111	EstimatedLatency(ms)	464.773	Trials	13952
ElapsedTime(s)	24158	EstimatedLatency(ms)	464.773	Trials	14016
ElapsedTime(s)	24355	EstimatedLatency(ms)	464.773	Trials	14080
ElapsedTime(s)	24455	EstimatedLatency(ms)	464.773	Trials	14144
ElapsedTime(s)	24652	EstimatedLatency(ms)	464.773	Trials	14208
ElapsedTime(s)	24749	EstimatedLatency(ms)	464.773	Trials	14272
ElapsedTime(s)	24844	EstimatedLatency(ms)	464.773	Trials	14336
ElapsedTime(s)	24922	EstimatedLatency(ms)	464.773	Trials	14400
ElapsedTime(s)	25119	EstimatedLatency(ms)	464.773	Trials	14464
ElapsedTime(s)	25219	EstimatedLatency(ms)	464.773	Trials	14528
ElapsedTime(s)	25305	EstimatedLatency(ms)	464.773	Trials	14592
ElapsedTime(s)	25356	EstimatedLatency(ms)	464.773	Trials	14656
ElapsedTime(s)	25555	EstimatedLatency(ms)	464.773	Trials	14720
ElapsedTime(s)	25610	EstimatedLatency(ms)	464.109	Trials	14784
ElapsedTime(s)	25656	EstimatedLatency(ms)	463.423	Trials	14848
ElapsedTime(s)	25703	EstimatedLatency(ms)	463.383	Trials	14912
ElapsedTime(s)	25784	EstimatedLatency(ms)	463.383	Trials	14976
ElapsedTime(s)	25982	EstimatedLatency(ms)	463.383	Trials	15040
ElapsedTime(s)	26039	EstimatedLatency(ms)	463.383	Trials	15104
ElapsedTime(s)	26131	EstimatedLatency(ms)	463.383	Trials	15168
ElapsedTime(s)	26174	EstimatedLatency(ms)	462.095	Trials	15232
ElapsedTime(s)	26370	EstimatedLatency(ms)	461.838	Trials	15296
ElapsedTime(s)	26464	EstimatedLatency(ms)	461.647	Trials	15360
ElapsedTime(s)	26667	EstimatedLatency(ms)	461.647	Trials	15424
ElapsedTime(s)	26723	EstimatedLatency(ms)	460.820	Trials	15488
ElapsedTime(s)	26777	EstimatedLatency(ms)	460.820	Trials	15552
ElapsedTime(s)	26856	EstimatedLatency(ms)	460.820	Trials	15616
ElapsedTime(s)	26960	EstimatedLatency(ms)	460.820	Trials	15680
ElapsedTime(s)	27057	EstimatedLatency(ms)	460.820	Trials	15744
ElapsedTime(s)	27256	EstimatedLatency(ms)	460.820	Trials	15808
ElapsedTime(s)	27352	EstimatedLatency(ms)	460.820	Trials	15872
ElapsedTime(s)	27553	EstimatedLatency(ms)	460.820	Trials	15936
ElapsedTime(s)	27610	EstimatedLatency(ms)	460.739	Trials	16000
ElapsedTime(s)	27658	EstimatedLatency(ms)	460.739	Trials	16064
ElapsedTime(s)	27760	EstimatedLatency(ms)	460.739	Trials	16128
ElapsedTime(s)	27803	EstimatedLatency(ms)	460.502	Trials	16192
ElapsedTime(s)	28003	EstimatedLatency(ms)	460.123	Trials	16256
ElapsedTime(s)	28062	EstimatedLatency(ms)	459.634	Trials	16320
ElapsedTime(s)	28171	EstimatedLatency(ms)	459.381	Trials	16384
ElapsedTime(s)	28371	EstimatedLatency(ms)	457.684	Trials	16448
ElapsedTime(s)	28569	EstimatedLatency(ms)	457.684	Trials	16512
ElapsedTime(s)	28765	EstimatedLatency(ms)	457.684	Trials	16576
ElapsedTime(s)	28854	EstimatedLatency(ms)	457.684	Trials	16640
ElapsedTime(s)	29043	EstimatedLatency(ms)	457.570	Trials	16704
ElapsedTime(s)	29151	EstimatedLatency(ms)	457.570	Trials	16768
ElapsedTime(s)	29244	EstimatedLatency(ms)	457.570	Trials	16832
ElapsedTime(s)	29315	EstimatedLatency(ms)	457.570	Trials	16896
ElapsedTime(s)	29373	EstimatedLatency(ms)	457.570	Trials	16960
ElapsedTime(s)	29461	EstimatedLatency(ms)	457.570	Trials	17024
ElapsedTime(s)	29561	EstimatedLatency(ms)	457.570	Trials	17088
ElapsedTime(s)	29622	EstimatedLatency(ms)	456.816	Trials	17152
ElapsedTime(s)	29717	EstimatedLatency(ms)	456.667	Trials	17216
ElapsedTime(s)	29814	EstimatedLatency(ms)	456.667	Trials	17280
ElapsedTime(s)	29864	EstimatedLatency(ms)	456.667	Trials	17344
ElapsedTime(s)	29907	EstimatedLatency(ms)	455.467	Trials	17408
ElapsedTime(s)	29999	EstimatedLatency(ms)	455.406	Trials	17472
ElapsedTime(s)	30085	EstimatedLatency(ms)	455.406	Trials	17536
ElapsedTime(s)	30286	EstimatedLatency(ms)	455.406	Trials	17600
ElapsedTime(s)	30337	EstimatedLatency(ms)	455.406	Trials	17664
ElapsedTime(s)	30411	EstimatedLatency(ms)	455.406	Trials	17728
ElapsedTime(s)	30600	EstimatedLatency(ms)	455.406	Trials	17792
ElapsedTime(s)	30706	EstimatedLatency(ms)	455.406	Trials	17856
ElapsedTime(s)	30764	EstimatedLatency(ms)	455.406	Trials	17920
ElapsedTime(s)	30964	EstimatedLatency(ms)	455.406	Trials	17984
ElapsedTime(s)	31076	EstimatedLatency(ms)	454.799	Trials	18048
ElapsedTime(s)	31174	EstimatedLatency(ms)	454.799	Trials	18112
ElapsedTime(s)	31288	EstimatedLatency(ms)	454.799	Trials	18176
ElapsedTime(s)	31342	EstimatedLatency(ms)	454.799	Trials	18240
ElapsedTime(s)	31402	EstimatedLatency(ms)	454.502	Trials	18304
ElapsedTime(s)	31598	EstimatedLatency(ms)	454.502	Trials	18368
ElapsedTime(s)	31693	EstimatedLatency(ms)	454.435	Trials	18432
ElapsedTime(s)	31887	EstimatedLatency(ms)	454.435	Trials	18496
ElapsedTime(s)	31955	EstimatedLatency(ms)	454.435	Trials	18560
ElapsedTime(s)	32063	EstimatedLatency(ms)	454.435	Trials	18624
ElapsedTime(s)	32164	EstimatedLatency(ms)	454.435	Trials	18688
ElapsedTime(s)	32362	EstimatedLatency(ms)	453.925	Trials	18752
ElapsedTime(s)	32560	EstimatedLatency(ms)	453.769	Trials	18816
ElapsedTime(s)	32751	EstimatedLatency(ms)	453.270	Trials	18880
ElapsedTime(s)	32948	EstimatedLatency(ms)	453.270	Trials	18944
ElapsedTime(s)	33044	EstimatedLatency(ms)	453.270	Trials	19008
ElapsedTime(s)	33100	EstimatedLatency(ms)	453.270	Trials	19072
ElapsedTime(s)	33215	EstimatedLatency(ms)	453.270	Trials	19136
ElapsedTime(s)	33281	EstimatedLatency(ms)	453.086	Trials	19200
ElapsedTime(s)	33481	EstimatedLatency(ms)	453.086	Trials	19264
ElapsedTime(s)	33576	EstimatedLatency(ms)	453.086	Trials	19328
ElapsedTime(s)	33677	EstimatedLatency(ms)	452.989	Trials	19392
ElapsedTime(s)	33741	EstimatedLatency(ms)	452.989	Trials	19456
ElapsedTime(s)	33801	EstimatedLatency(ms)	452.989	Trials	19520
ElapsedTime(s)	33873	EstimatedLatency(ms)	452.989	Trials	19584
ElapsedTime(s)	33975	EstimatedLatency(ms)	452.989	Trials	19648
ElapsedTime(s)	34174	EstimatedLatency(ms)	452.989	Trials	19712
ElapsedTime(s)	34221	EstimatedLatency(ms)	452.103	Trials	19776
ElapsedTime(s)	34326	EstimatedLatency(ms)	452.103	Trials	19840
ElapsedTime(s)	34439	EstimatedLatency(ms)	452.103	Trials	19904
ElapsedTime(s)	34557	EstimatedLatency(ms)	452.103	Trials	19968
ElapsedTime(s)	34615	EstimatedLatency(ms)	452.103	Trials	20032
```
I could have ctrl-c, but I want the model improving as much as possible
<br> It took me 1 day to wait for the auto tuning process to finish and the results didn't improve much :frowning:





---
[Visit Topic](https://discuss.tvm.apache.org/t/compiling-model-with-target-llvm-not-faster/10889/6) to respond.

You are receiving this because you enabled mailing list mode.

To unsubscribe from these emails, [click here](https://discuss.tvm.apache.org/email/unsubscribe/32c4e734d96966253672903905e86123fcaa0777bff864d0cfd30ead760a6cfc).

[Apache TVM Discuss] [Questions] Compiling model with target="llvm" not faster

Posted by Nam Nguyen Duc via Apache TVM Discuss <no...@discuss.tvm.ai>.

Yep, the first line 1856 trial `EstimatedLatency(ms)` is 1063.960 s





---
[Visit Topic](https://discuss.tvm.apache.org/t/compiling-model-with-target-llvm-not-faster/10889/10) to respond.

You are receiving this because you enabled mailing list mode.

To unsubscribe from these emails, [click here](https://discuss.tvm.apache.org/email/unsubscribe/57a26b5712bc366bca2bf14be2a80bfd483930387a02e993aab785b63d3e9391).

[Apache TVM Discuss] [Questions] Compiling model with target="llvm" not faster

Posted by Andrey Malyshev via Apache TVM Discuss <no...@discuss.tvm.ai>.

[quote="namduc, post:8, topic:10889"]
what are the “fist lines” you mentioned here !?
[/quote]
I see in quote a part of the tuning trace starting from `ElapsedTime(s)	17787	EstimatedLatency(ms)	471.479	Trials	10176`, it refers to 10899 trial, and I referred it as "first line". while if you take a look into full file, the first line should start from 29*64=1856 trial. and perf from this 1856 to 10176 should be improved significantly





---
[Visit Topic](https://discuss.tvm.apache.org/t/compiling-model-with-target-llvm-not-faster/10889/9) to respond.

You are receiving this because you enabled mailing list mode.

To unsubscribe from these emails, [click here](https://discuss.tvm.apache.org/email/unsubscribe/5e1c834e1cd40d691ae5532371f3a50439a7ff840b6d767632c173f67c75768e).

[Apache TVM Discuss] [Questions] Compiling model with target="llvm" not faster

Posted by Nam Nguyen Duc via Apache TVM Discuss <no...@discuss.tvm.ai>.

[quote="elvin-n, post:7, topic:10889"]
Is it the same model as in the beginning? 
[/quote]
Model pure pytorch inference in: 0.7795s compare with tvm model after tuning is faster by 0.2s

[quote="elvin-n, post:7, topic:10889"]
At the same time I am pretty sure that if we take to the first lines, results should be improved significantly during tuning.
[/quote]
what are the "fist lines" you mentioned here !?
<br> I don't quite understand this sentence





---
[Visit Topic](https://discuss.tvm.apache.org/t/compiling-model-with-target-llvm-not-faster/10889/8) to respond.

You are receiving this because you enabled mailing list mode.

To unsubscribe from these emails, [click here](https://discuss.tvm.apache.org/email/unsubscribe/af43e1fb9dab6170c29a68be8c93f6470863ecab01e2d0dd30859a8bbf8fd77f).

[Apache TVM Discuss] [Questions] Compiling model with target="llvm" not faster

Posted by Andrey Malyshev via Apache TVM Discuss <no...@discuss.tvm.ai>.

[quote="namduc, post:6, topic:10889"]
after that i got the same result with a model onnx running on onnxruntime
[/quote]
It might be that onnxruntime was able to use hardware resources the most efficient way and improving of the inference time more is possible but might be hard. And TVM get the same perfect result. It's hard to say without looking into the model. Is it publically available model? Does it have more conv layers or matmul/dense?

[quote="namduc, post:6, topic:10889"]
and the results didn’t improve much
[/quote]
Is it the same model as in the beginning? If it is the same - there is a progress comparing to different tvm results. From 3.4s to 0.5 seconds. As for quote of tsv file - I see a part from 10000 trials to 20000. Probably tuning had to be stopped on that 1000th trial or early. At the same time I am pretty sure that if we take to the first lines, results should be improved significantly during tuning.





---
[Visit Topic](https://discuss.tvm.apache.org/t/compiling-model-with-target-llvm-not-faster/10889/7) to respond.

You are receiving this because you enabled mailing list mode.

To unsubscribe from these emails, [click here](https://discuss.tvm.apache.org/email/unsubscribe/a79be5c4b3aeb6108ce1de2cce108ca83738404308220e20b2d1290d0d03c8e3).