You are viewing a plain text version of this content. The canonical link for it is here.
Posted to discuss-archive@tvm.apache.org by p via Apache TVM Discuss <no...@discuss.tvm.ai> on 2022/03/04 12:26:41 UTC

[Apache TVM Discuss] [Questions] Have anyone deployed rpc tracker to k8s cluster?


I have deployed rpc tracker to k8s cluster, at the begining it looks like it was working:
- devices can connect to rpc tracker
- query rpc tracker results with free devices
- rpc is behind k8s' service and is configured in deployment with one Pod

But when I do simple benchmark run, i'm getting following error:
```
Traceback (most recent call last):
  File "/workspace/tcl_scripts/benchmark.py", line 106, in <module>
    main(args)
  File "/workspace/tcl_scripts/benchmark.py", line 64, in main
    compile_upload_benchmark_model(args, mod, params, target)
  File "/workspace/tcl_scripts/benchmark.py", line 35, in compile_upload_benchmark_model
    args.rpc_key, args.rpc_tracker, args.rpc_port, timeout=500)
  File "/workspace/python/tvm/autotvm/measure/measure_methods.py", line 735, in request_remote
    remote = tracker.request(device_key, priority=priority, session_timeout=timeout)
  File "/workspace/python/tvm/rpc/client.py", line 418, in request
    "Cannot request %s after %d retry, last_error:%s" % (key, max_retry, str(last_err))
RuntimeError: Cannot request android after 5 retry, last_error:Traceback (most recent call last):
  3: TVMFuncCall
  2: std::_Function_handler<void (tvm::runtime::TVMArgs, tvm::runtime::TVMRetValue*), tvm::runtime::{lambda(tvm::runtime::TVMArgs, tvm::runtime::TVMRetValue*)#1}>::_M_invoke(std::_Any_data const&, tvm::runtime::TVMArgs&&, tvm::runtime::TVMRetValue*&&)
  1: tvm::runtime::RPCClientConnect(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, int, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, tvm::runtime::TVMArgs)
  0: tvm::runtime::RPCConnect(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, int, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, tvm::runtime::TVMArgs)
  File "/workspace/src/runtime/rpc/rpc_socket_impl.cc", line 72
TVMError: 
---------------------------------------------------------------
An error occurred during the execution of TVM.
For more information, please see: https://tvm.apache.org/docs/errors.html
---------------------------------------------------------------
  Check failed: (sock.Connect(addr)) is false: Connect to 10.70.227.3:5001 failed

```

Service:
```
apiVersion: v1
kind: Service
metadata:
  name: tvm-rpc-tracker-service
spec:
  type: LoadBalancer
  selector:
    app: tvm-rpc-tracker
  ports:
    - name: rpc1
      protocol: TCP
      port: 9190
      targetPort: 9190
    - name: rpc2
      protocol: TCP
      port: 5000
      targetPort: 5000
    - name: rpc3
      protocol: TCP
      port: 5001
      targetPort: 5001
    - name: rpc4
      protocol: TCP
      port: 5002
      targetPort: 5002
    - name: rpc5
      protocol: TCP
      port: 5003
      targetPort: 5003
```


Deployment:
```
apiVersion: apps/v1
kind: Deployment
metadata:
  name: tvm.rpc-tracker-deployment
  labels:
    app: tvm-rpc-tracker
spec:
  replicas: 1
  selector:
    matchLabels:
      app: tvm-rpc-tracker
  template:
    metadata:
      labels:
        app: tvm-rpc-tracker
    spec:
      nodeSelector:
        location: dc
      containers:
      - name: tvm
        image: tvm:0.0.3
        command: ["/bin/bash", "-ec", "/usr/bin/python3 -m tvm.exec.rpc_tracker --host=0.0.0.0 --port=9190"]
        ports:
        - containerPort: 9190
        - containerPort: 5000
        - containerPort: 5001
        - containerPort: 5002
        - containerPort: 5003
```


Question, how many 500* ports should I open/forward? Does all of them should be TCP? Have an idea how to debug it?





---
[Visit Topic](https://discuss.tvm.apache.org/t/have-anyone-deployed-rpc-tracker-to-k8s-cluster/12242/1) to respond.

You are receiving this because you enabled mailing list mode.

To unsubscribe from these emails, [click here](https://discuss.tvm.apache.org/email/unsubscribe/ad1ab548c660d7f9474ec266a5710f307277fa462690d70be79d12472b40261c).

[Apache TVM Discuss] [Questions] Have anyone deployed rpc tracker to k8s cluster?

Posted by p via Apache TVM Discuss <no...@discuss.tvm.ai>.

I think I have solved my problem, and I think it was a networking problem





---
[Visit Topic](https://discuss.tvm.apache.org/t/have-anyone-deployed-rpc-tracker-to-k8s-cluster/12242/2) to respond.

You are receiving this because you enabled mailing list mode.

To unsubscribe from these emails, [click here](https://discuss.tvm.apache.org/email/unsubscribe/b15dbbb7f8a2ed51f2e00438e484d0d17f249f9fb71c0f568d377fcd50b861ec).