You are viewing a plain text version of this content. The canonical link for it is here.

Posted to discuss-archive@tvm.apache.org by nolan via TVM Discuss <no...@discuss.tvm.ai> on 2020/08/31 02:43:50 UTC

[TVM Discuss] [Questions] Performance of same op and workload in different model varies differently


Compared two similar Bert models running on CPU with TVM, one is PyTorch model, the other is MXNet model. Due to the large performance difference, I did some profiling. The result shows the run time of the same operation(matmul) with same workload varies big.

ENV:

1. TVM: build with MKL.
2. Intel CPU
3. OpenMP:  `KMP_AFFINITY=compact,1,0 OMP_NUM_THREADS=24`

Model inference time:
 
    # mxnet model
    TVM Mean inference time: 5.53 ms
    # pytorch model
    TVM Mean inference time: 23.05 ms

Profiling result:

    # MXNet model
    Node Name                           Ops.                Time(us)   Time(%)  Shape.  Inputs  Outputs
    --------- 
    fused_nn_dense_add_15        fused_nn_dense_add_1       308.926   5.58     (32, 768)      3       1
    fused_nn_dense_add_11         fused_nn_dense_add_1       307.277   5.551    (32, 768)        3       1

    # PyTorch Model
    Node Name                           Ops.                Time(us)   Time(%)  Shape.  Inputs  Outputs
    --------- 
    fused_nn_dense_add_3        fused_nn_dense_add_3       1783.75    7.631    (32, 768)     3       1
    fused_nn_dense_add_31      fused_nn_dense_add_3        1593.08    6.815    (32, 768)    3       1

IR code (same between PyTorch model and MXNet model)

      attr [0] "compute_scope" = "fused_nn_dense_add_3_compute_";
      attr [C: handle] "storage_scope" = "global";
      allocate(C, float32, [24576]) {
        attr [0] "extern_scope" = 0;
        @tir.tvm_call_packed("tvm.contrib.cblas.matmul", @tir.tvm_stack_make_array(placeholder, @tir.tvm_stack_make_shape(32, 3072, dtype=handle), 0, 2, 0f32, 0, dtype=handle), @tir.tvm_stack_make_array(placeholder_1, @tir.tvm_stack_make_shape(768, 3072, dtype=handle), 0, 2, 0f32, 0, dtype=handle), @tir.tvm_stack_make_array(C, @tir.tvm_stack_make_shape(32, 768, dtype=handle), 0, 2, 0f32, 0, dtype=handle), False, True, dtype=int32)
        for (ax0: int32, 0, 32) "parallel" {
          for (ax1: int32, 0, 768) {
            T_add[((ax0*768) + ax1)] = ((float32*)C[((ax0*768) + ax1)] + (float32*)placeholder_2[ax1])
          }
        }

However, when setting  `OMP_NUM_THREADS=1`  the model inference time is same, seems it's a problem with multiple threads.

What may cause the difference?

Refer to: https://github.com/apache/incubator-tvm/issues/6354





---
[Visit Topic](https://discuss.tvm.ai/t/performance-of-same-op-and-workload-in-different-model-varies-differently/7766/1) to respond.

You are receiving this because you enabled mailing list mode.

To unsubscribe from these emails, [click here](https://discuss.tvm.ai/email/unsubscribe/1f49529833bf95b6136dfa3765c338c333e41f2eab8e1d07c23236576a59fb62).

[TVM Discuss] [Questions] Performance of same op and workload in different model varies differently

Posted by nolan via TVM Discuss <no...@discuss.tvm.ai>.


There is no thread related ops. Besides, multi threads is faster than one threads.





---
[Visit Topic](https://discuss.tvm.ai/t/performance-of-same-op-and-workload-in-different-model-varies-differently/7766/3) to respond.

You are receiving this because you enabled mailing list mode.

To unsubscribe from these emails, [click here](https://discuss.tvm.ai/email/unsubscribe/7ae7d04220c316c89804fa42b109a84b8ba05d0a3622ac6a1cc84149f2238333).

[TVM Discuss] [Questions] Performance of same op and workload in different model varies differently

Posted by Chenfan via TVM Discuss <no...@discuss.tvm.ai>.


> However, when setting  `OMP_NUM_THREADS=1`  the model inference time is same, seems it’s a problem with multiple threads.

Will it be possible that there's any thread realated limitation in your pytorch script?





---
[Visit Topic](https://discuss.tvm.ai/t/performance-of-same-op-and-workload-in-different-model-varies-differently/7766/2) to respond.

You are receiving this because you enabled mailing list mode.

To unsubscribe from these emails, [click here](https://discuss.tvm.ai/email/unsubscribe/a03bfaf5a782eb6a17e831fad6edb36e238d60dc02b60046f8e7c3fe8062578b).