You are viewing a plain text version of this content. The canonical link for it is here.
Posted to discuss-archive@tvm.apache.org by ckh via TVM Discuss <no...@discuss.tvm.ai> on 2020/04/08 06:11:26 UTC
[TVM Discuss] [Questions] The current version of tvm cannot find the configuration of conv2d


Hello.

In rk3399, i found a performance decrease during inference using the vgg-16 model.

Performance was measured using the test code below.

    import tvm
    import tvm.relay as relay
    from tvm.contrib import graph_runtime
    import numpy as np
    import topi
    from tvm.relay.testing.temp_op_attr import TempOpAttr

    target_arm_cpu = tvm.target.create('llvm -device=arm_cpu -target=aarch64-linux-gnu')
    ctx_arm_cpu =  tvm.cpu()
    dtype='float32'
    batch_size = 1
    num_class = 1000
    image_shape = (3, 224, 224)
    data_shape = (batch_size,) + image_shape
    out_shape = (batch_size, num_class)
    mod, paramsO = relay.testing.vgg.get_workload(
        num_layers=16, batch_size=batch_size, image_shape=image_shape)
    opt_level = 3

    #arm_cpu 
    with relay.build_config(opt_level = opt_level):
        graph, lib, params = relay.build_module.build( mod, target_arm_cpu , params = paramsO )

    data = tvm.nd.array( np.random.uniform(-1, 1, size=data_shape ).astype("float32") , ctx_arm_cpu )
    module = graph_runtime.create(graph, lib, ctx_arm_cpu)
    module.set_input("data", data)
    module.set_input(**params)
    module.run()

When running vgg-16 using arm cpu in current tvm version, the performance is as follows.

`Mean inference time (std dev): 1892.25 ms (2.20 ms)`

and old tvm version is 

`Mean inference time (std dev): 989.96 ms (0.80 ms)`

The performance difference between the new version and the old version is too big.

i think the new version of tvm doesn't seem to find the config for vgg-16.
Below is the log when compiling vgg-16 model using Relay.

    Cannot find config for target=llvm -device=arm_cpu -target=aarch64-linux-gnu, workload=('conv2d_nchw_winograd.arm_cpu', ('TENSOR', (1, 3, 224, 224), 'float32'), ('TENSOR', (64, 3, 3, 3), 'float32'), (1, 1), (1, 1, 1, 1), (1, 1), 'float32'). A fallback configuration is used, which may bring great performance regression.
    Cannot find config for target=llvm -device=arm_cpu -target=aarch64-linux-gnu, workload=('conv2d_nchw_spatial_pack.arm_cpu', ('TENSOR', (1, 64, 224, 224), 'float32'), ('TENSOR', (64, 64, 3, 3), 'float32'), (1, 1), (1, 1, 1, 1), (1, 1), 'float32'). A fallback configuration is used, which may bring great performance regression.
    Cannot find config for target=llvm -device=arm_cpu -target=aarch64-linux-gnu, workload=('conv2d_nchw_spatial_pack.arm_cpu', ('TENSOR', (1, 64, 112, 112), 'float32'), ('TENSOR', (128, 64, 3, 3), 'float32'), (1, 1), (1, 1, 1, 1), (1, 1), 'float32'). A fallback configuration is used, which may bring great performance regression.
    Cannot find config for target=llvm -device=arm_cpu -target=aarch64-linux-gnu, workload=('conv2d_nchw_spatial_pack.arm_cpu', ('TENSOR', (1, 128, 112, 112), 'float32'), ('TENSOR', (128, 128, 3, 3), 'float32'), (1, 1), (1, 1, 1, 1), (1, 1), 'float32'). A fallback configuration is used, which may bring great performance regression.
    Cannot find config for target=llvm -device=arm_cpu -target=aarch64-linux-gnu, workload=('conv2d_nchw_spatial_pack.arm_cpu', ('TENSOR', (1, 128, 56, 56), 'float32'), ('TENSOR', (256, 128, 3, 3), 'float32'), (1, 1), (1, 1, 1, 1), (1, 1), 'float32'). A fallback configuration is used, which may bring great performance regression.
    Cannot find config for target=llvm -device=arm_cpu -target=aarch64-linux-gnu, workload=('conv2d_nchw_spatial_pack.arm_cpu', ('TENSOR', (1, 256, 56, 56), 'float32'), ('TENSOR', (256, 256, 3, 3), 'float32'), (1, 1), (1, 1, 1, 1), (1, 1), 'float32'). A fallback configuration is used, which may bring great performance regression.
    Cannot find config for target=llvm -device=arm_cpu -target=aarch64-linux-gnu, workload=('conv2d_nchw_spatial_pack.arm_cpu', ('TENSOR', (1, 256, 28, 28), 'float32'), ('TENSOR', (512, 256, 3, 3), 'float32'), (1, 1), (1, 1, 1, 1), (1, 1), 'float32'). A fallback configuration is used, which may bring great performance regression.
    Cannot find config for target=llvm -device=arm_cpu -target=aarch64-linux-gnu, workload=('conv2d_nchw_spatial_pack.arm_cpu', ('TENSOR', (1, 512, 28, 28), 'float32'), ('TENSOR', (512, 512, 3, 3), 'float32'), (1, 1), (1, 1, 1, 1), (1, 1), 'float32'). A fallback configuration is used, which may bring great performance regression.
    Cannot find config for target=llvm -device=arm_cpu -target=aarch64-linux-gnu, workload=('conv2d_nchw_spatial_pack.arm_cpu', ('TENSOR', (1, 512, 14, 14), 'float32'), ('TENSOR', (512, 512, 3, 3), 'float32'), (1, 1), (1, 1, 1, 1), (1, 1), 'float32'). A fallback configuration is used, which may bring great performance regression.
    Cannot find config for target=llvm -device=arm_cpu -target=aarch64-linux-gnu, workload=('dense_nopack.x86', ('TENSOR', (1, 4096), 'float32'), ('TENSOR', (1000, 4096), 'float32'), None, 'float32'). A fallback configuration is used, which may bring great performance regression.
    Cannot find config for target=llvm -device=arm_cpu -target=aarch64-linux-gnu, workload=('dense_nopack.x86', ('TENSOR', (1, 4096), 'float32'), ('TENSOR', (4096, 4096), 'float32'), None, 'float32'). A fallback configuration is used, which may bring great performance regression.
    Cannot find config for target=llvm -device=arm_cpu -target=aarch64-linux-gnu, workload=('dense_nopack.x86', ('TENSOR', (1, 25088), 'float32'), ('TENSOR', (4096, 25088), 'float32'), None, 'float32'). A fallback configuration is used, which may bring great performance regression.

And below is the log when I compiled vgg-16 with old tvm.

    Cannot find config for target=llvm -device=arm_cpu -target=aarch64-linux-gnu, workload=('dense', (1, 4096, 'float32'), (1000, 4096, 'float32'), 0, 'float32'). A fallback configuration is used, which may bring great performance regression.
    Cannot find config for target=llvm -device=arm_cpu -target=aarch64-linux-gnu, workload=('dense', (1, 4096, 'float32'), (4096, 4096, 'float32'), 0, 'float32'). A fallback configuration is used, which may bring great performance regression.
    Cannot find config for target=llvm -device=arm_cpu -target=aarch64-linux-gnu, workload=('dense', (1, 25088, 'float32'), (4096, 25088, 'float32'), 0, 'float32'). A fallback configuration is used, which may bring great performance regression.

As you can see from the log, the fallback config for conv2d does not appear in the old version of tvm, but the fallback config for con2d occurs in the new version.

I think the current version of tvm can't catch the conv2d config, so it seems to cause performance degradation. is it intended or internal tvm problem?





---
[Visit Topic](https://discuss.tvm.ai/t/the-current-version-of-tvm-cannot-find-the-configuration-of-conv2d/6277/1) to respond.

You are receiving this because you enabled mailing list mode.

To unsubscribe from these emails, [click here](https://discuss.tvm.ai/email/unsubscribe/a397378c22234c64f0a55442fadd8223f7766e865c7e26a96523d5158dcc6706).