You are viewing a plain text version of this content. The canonical link for it is here.
Posted to discuss-archive@tvm.apache.org by Sergio via TVM Discuss <no...@discuss.tvm.ai> on 2020/04/16 19:54:53 UTC

[TVM Discuss] [Questions] ROCm 'segmentation fault' error when auto-tuning


When I run a modified version of the tutorial file "tune_relay_cuda.py" (using target = "rocm"), I get the following error some time auto-tuning starts

    Tuning...
    Task(func_name=topi_nn_conv2d, args=(('TENSOR', (1, 512, 14, 14), 'float32'), ('TENSOR', (512, 512, 3, 3), 'float32'), (1, 1), (1, 1, 1, 1), (1, 1), 'NCHW', 'float32'), kwargs={}, workload=('conv2d', (1, 512, 14, 14, 'float32'), (512, 512, 3, 3, 'float32'), (1, 1), (1, 1, 1, 1), (1, 1), 'NCHW', 'float32'))
    rocm
    [Task  1/ 9]  Current/Best:   21.86/3183.80 GFLOPS | Progress: (60/100) | 174.55 s

    Segmentation fault (core dumped)

I am using a Vega 20 AMD GPU and I was wondering if I should add the `-model xx` definition to the target to avoid this.

I was wondering if somebody has experienced the same issue in the past. Any information on this issue would be greatly appreciated





---
[Visit Topic](https://discuss.tvm.ai/t/rocm-segmentation-fault-error-when-auto-tuning/6402/1) to respond.

You are receiving this because you enabled mailing list mode.

To unsubscribe from these emails, [click here](https://discuss.tvm.ai/email/unsubscribe/34b673dccab46cb884eb65902e6915fafa25aa8f2764d7f7e8cb45576a0a4c47).

[TVM Discuss] [Questions] ROCm 'segmentation fault' error when auto-tuning

Posted by Sergio via TVM Discuss <no...@discuss.tvm.ai>.

Downgrading to xgboost 0.90 fixed the segmentation fault issue!

Thanks a lot @t-vi





---
[Visit Topic](https://discuss.tvm.ai/t/rocm-segmentation-fault-error-when-auto-tuning/6402/6) to respond.

You are receiving this because you enabled mailing list mode.

To unsubscribe from these emails, [click here](https://discuss.tvm.ai/email/unsubscribe/4f79be5788ee71905dcd7c4fb64038a59d0ceb67b6b262417908988078f665be).

[TVM Discuss] [Questions] ROCm 'segmentation fault' error when auto-tuning

Posted by Thomas V via TVM Discuss <no...@discuss.tvm.ai>.

Currently, we use the CUDA schedule (and op) on ROCm:

https://github.com/apache/incubator-tvm/blob/2cd987d92724be0f859bfb624ce797f9c70167bb/python/tvm/relay/op/strategy/rocm.py#L47-L50





---
[Visit Topic](https://discuss.tvm.ai/t/rocm-segmentation-fault-error-when-auto-tuning/6402/8) to respond.

You are receiving this because you enabled mailing list mode.

To unsubscribe from these emails, [click here](https://discuss.tvm.ai/email/unsubscribe/b744b44f96334395902b155a75b1424d7aa8d27a2ef448315d5ffa1c02584dea).

[TVM Discuss] [Questions] ROCm 'segmentation fault' error when auto-tuning

Posted by Sergio via TVM Discuss <no...@discuss.tvm.ai>.

Hi @t-vi,

I have one follow-up question. I was wondering if you know the location of the file defining the schedule for the ROCm backend conv2d. So far I have checked the file in the link below, but I haven't been able to find the schedule template. I would appreciate any information on this regard.

https://github.com/apache/incubator-tvm/blob/2cd987d92724be0f859bfb624ce797f9c70167bb/topi/python/topi/rocm/conv2d.py





---
[Visit Topic](https://discuss.tvm.ai/t/rocm-segmentation-fault-error-when-auto-tuning/6402/7) to respond.

You are receiving this because you enabled mailing list mode.

To unsubscribe from these emails, [click here](https://discuss.tvm.ai/email/unsubscribe/acea2afc9a08f380e4d92380b7ecac32c52cd557a7aa2f58b10e6f2fdb0d651a).

[TVM Discuss] [Questions] ROCm 'segmentation fault' error when auto-tuning

Posted by Thomas V via TVM Discuss <no...@discuss.tvm.ai>.

Given that it happens after 60 steps, this might not be ROCm but rather the xgboost module. In that case, upgrading to the pre-release or downgrading helps.
https://github.com/apache/incubator-tvm/issues/4953#issuecomment-619255802

That said we also fixed a potential segfault in the AMDGPU llvm codegen last week, so upgrading to the latest TVM master might be a good idea.

Best regards

Thomas





---
[Visit Topic](https://discuss.tvm.ai/t/rocm-segmentation-fault-error-when-auto-tuning/6402/5) to respond.

You are receiving this because you enabled mailing list mode.

To unsubscribe from these emails, [click here](https://discuss.tvm.ai/email/unsubscribe/eb64480f15cee739e7b07a9df6a79a6ac6e70eaeb6c70f5ec12d5b13b3ee5271).

[TVM Discuss] [Questions] ROCm 'segmentation fault' error when auto-tuning

Posted by tqchen via TVM Discuss <no...@discuss.tvm.ai>.

You will need to compile with miopen header in your include path. Alternatively, you can remove the miopen.cc, this won’t affect the autotvm part





---
[Visit Topic](https://discuss.tvm.ai/t/rocm-segmentation-fault-error-when-auto-tuning/6402/4) to respond.

You are receiving this because you enabled mailing list mode.

To unsubscribe from these emails, [click here](https://discuss.tvm.ai/email/unsubscribe/b0b9b15cd921a9d2cd1722dd4b24138769bd63c99441c89189e9c3c72edafb7c).

[TVM Discuss] [Questions] ROCm 'segmentation fault' error when auto-tuning

Posted by Sergio via TVM Discuss <no...@discuss.tvm.ai>.

Hi @tqchen 

Thank you for your prompt reply. I am following the instructions in the link you sent but, when executing the Makefile, I get

rocm_runtime_pack.cc:33:52: fatal error: ../../src/contrib/miopen/conv_forward.cc: No such file or directory

I noticed that the directory

../../src/contrib/miopen

does not exist. I could find thew missing file in

../../src/runtime/contrib/miopen/

maybe I should just modify accordingly and make?





---
[Visit Topic](https://discuss.tvm.ai/t/rocm-segmentation-fault-error-when-auto-tuning/6402/3) to respond.

You are receiving this because you enabled mailing list mode.

To unsubscribe from these emails, [click here](https://discuss.tvm.ai/email/unsubscribe/31b4a4dd0608a89218f832f08a3667968f108c3d4373dc4a283fa02ca1bbb6cf).

[TVM Discuss] [Questions] ROCm 'segmentation fault' error when auto-tuning

Posted by tqchen via TVM Discuss <no...@discuss.tvm.ai>.

Youw will need to setup an RPC server explicitly as per https://github.com/apache/incubator-tvm/tree/master/apps/rocm_rpc due to a limitation of the rocm driver





---
[Visit Topic](https://discuss.tvm.ai/t/rocm-segmentation-fault-error-when-auto-tuning/6402/2) to respond.

You are receiving this because you enabled mailing list mode.

To unsubscribe from these emails, [click here](https://discuss.tvm.ai/email/unsubscribe/3948106a7b9b1ecc205dd3e8c6f98a554aee04188ac97d9ea16b8e92bd7f0040).