You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@tvm.apache.org by GitBox <gi...@apache.org> on 2022/04/19 12:39:04 UTC
[GitHub] [tvm] pfk-beta opened a new issue, #11058: [Bug] tuning cannot send task but tracker - device connection is ok
pfk-beta opened a new issue, #11058:
URL: https://github.com/apache/tvm/issues/11058
Hello,
I think that I have correct rpc connection between my computer and android device:
1. I have compiled tvm_rpc executable using Docker.android_demo with modifiication. I have put there headers for OpenCL, and compiled binary to use them.
2. I have put executable to the device, and `libc++_shared.so` `libtvm_runtime.so` libraries, as well.
3. When I run command `adb -s HQUCI76PJJ7TFYLZ shell "LD_LIBRARY_PATH=/data/local/tmp /data/local/tmp/tvm_rpc server --host=192.168.1.122 --port=5000 --tracker=192.168.1.135:9190 --key=prague --work-dir=/data/local/tmp/"`, it looks like correct working. I see following info:
```
[14:28:48] /workspace/apps/cpp_rpc/main.cc:96: host = 192.168.1.122
[14:28:48] /workspace/apps/cpp_rpc/main.cc:97: port = 5000
[14:28:48] /workspace/apps/cpp_rpc/main.cc:98: port_end = 9099
[14:28:48] /workspace/apps/cpp_rpc/main.cc:99: tracker = ('192.168.1.135', 9190)
[14:28:48] /workspace/apps/cpp_rpc/main.cc:100: key = prague
[14:28:48] /workspace/apps/cpp_rpc/main.cc:101: custom_addr =
[14:28:48] /workspace/apps/cpp_rpc/main.cc:102: work_dir = /data/local/tmp/
[14:28:48] /workspace/apps/cpp_rpc/main.cc:103: silent = False
[14:28:48] /workspace/apps/cpp_rpc/main.cc:264: Starting CPP Server, Press Ctrl+C to stop.
[14:28:48] /workspace/apps/cpp_rpc/rpc_server.cc:130: bind to 192.168.1.122:5000
[14:28:48] /workspace/apps/cpp_rpc/rpc_tracker_client.h:201: Tracker connecting to 192.168.1.135:9190
```
4. I have compiled runtime on my computer, with RPC and LLVM on. When I run rpc_tracker, and query rpc_tracker I see following infomration (I think it is working correctly):
```
Tracker address 192.168.1.135:9190
Server List
----------------------------
server-address key
----------------------------
192.168.1.210:5000 server:prague
192.168.1.122:5000 server:prague
```
```
python3 -m tvm.exec.rpc_tracker --port 9190
INFO:RPCTracker:bind to 0.0.0.0:9190
```
5. When I run tuning script, `tvm_rpc` gives me little message `[14:23:59] /workspace/apps/cpp_rpc/rpc_server.cc:300: Connection success 192.168.1.135:60674`. Then I get stacktrace from tuning script:
```
File "/home/piotr/projects/odai-meta/odai-tvm/docker/tcl/tcl_scripts/tune.py", line 152, in <module>
main(args)
File "/home/piotr/projects/odai-meta/odai-tvm/docker/tcl/tcl_scripts/tune.py", line 99, in main
tune_model(args, mod, params, target)
File "/home/piotr/projects/odai-meta/odai-tvm/docker/tcl/tcl_scripts/tune.py", line 77, in tune_model
task_tuner.tune(
File "/home/piotr/projects/odai-meta/odai-tvm/tvm/python/tvm/autotvm/tuner/xgboost_tuner.py", line 105, in tune
super(XGBTuner, self).tune(*args, **kwargs)
File "/home/piotr/projects/odai-meta/odai-tvm/tvm/python/tvm/autotvm/tuner/tuner.py", line 112, in tune
measure_batch = create_measure_batch(self.task, measure_option)
File "/home/piotr/projects/odai-meta/odai-tvm/tvm/python/tvm/autotvm/measure/measure.py", line 282, in create_measure_batch
attach_objects = runner.set_task(task)
File "/home/piotr/projects/odai-meta/odai-tvm/tvm/python/tvm/autotvm/measure/measure_methods.py", line 291, in set_task
raise RuntimeError(
RuntimeError: Cannot get remote devices from the tracker. Please check the status of tracker by 'python -m tvm.exec.query_rpc_tracker --port [THE PORT YOU USE]' and make sure you have free devices on the queue status.
```
(But, in another terminal I'm runing query in `watch` command, so it is working correctly).
6. After one minute I see in `tvm_rpc` following message:
```
[14:24:59] /workspace/apps/cpp_rpc/rpc_server.cc:198: Child pid=12547 killed (timeout = 60), Process status = 15
[14:24:59] /workspace/apps/cpp_rpc/rpc_server.cc:229: Socket Connection Closed
```
and tuning script is exiting with following stacktrace:
```
Exception in thread Thread-2:
Traceback (most recent call last):
File "/usr/lib/python3.9/threading.py", line 973, in _bootstrap_inner
self.run()
File "/usr/lib/python3.9/threading.py", line 910, in run
self._target(*self._args, **self._kwargs)
File "/home/piotr/projects/odai-meta/odai-tvm/tvm/python/tvm/autotvm/measure/measure_methods.py", line 769, in _check
while not dev.exist: # wait until we get an available device
File "/home/piotr/projects/odai-meta/odai-tvm/tvm/python/tvm/_ffi/runtime_ctypes.py", line 262, in exist
return self._GetDeviceAttr(self.device_type, self.device_id, 0) != 0
File "/home/piotr/projects/odai-meta/odai-tvm/tvm/python/tvm/_ffi/runtime_ctypes.py", line 246, in _GetDeviceAttr
return tvm.runtime._ffi_api.GetDeviceAttr(device_type, device_id, attr_id)
File "/home/piotr/projects/odai-meta/odai-tvm/tvm/python/tvm/_ffi/_ctypes/packed_func.py", line 237, in __call__
raise get_last_ffi_error()
tvm._ffi.base.TVMError: Traceback (most recent call last):
4: TVMFuncCall
3: std::_Function_handler<void (tvm::runtime::TVMArgs, tvm::runtime::TVMRetValue*), __mk_TVM1::{lambda(tvm::runtime::TVMArgs, tvm::runtime::TVMRetValue*)#1}>::_M_invoke(std::_Any_data const&, tvm::runtime::TVMArgs&&, tvm::runtime::TVMRetValue*&&)
2: tvm::runtime::RPCDeviceAPI::GetAttr(DLDevice, tvm::runtime::DeviceAttrKind, tvm::runtime::TVMRetValue*)
1: non-virtual thunk to tvm::runtime::RPCClientSession::GetAttr(DLDevice, tvm::runtime::DeviceAttrKind, tvm::runtime::TVMRetValue*)
0: std::_Function_handler<void (tvm::runtime::TVMArgs, tvm::runtime::TVMRetValue*), tvm::runtime::RPCEndpoint::Init()::{lambda(tvm::runtime::TVMArgs, tvm::runtime::TVMRetValue*)#2}>::_M_invoke(std::_Any_data const&, tvm::runtime::TVMArgs&&, tvm::runtime::TVMRetValue*&&)
File "/home/piotr/projects/odai-meta/odai-tvm/tvm/src/runtime/rpc/rpc_endpoint.cc", line 681
TVMError:
---------------------------------------------------------------
An error occurred during the execution of TVM.
For more information, please see: https://tvm.apache.org/docs/errors.html
---------------------------------------------------------------
Check failed: (code == RPCCode::kReturn) is false: code=1
Done.
```
7. Both, in building tvm_rpc, and in runing tuning script I'm using sourcecode from `v0.9.dev0` git tag, is it ok?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@tvm.apache.org.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
Re: [I] [Bug] tuning cannot send task but tracker - device connection is ok [tvm]
Posted by "pfk-beta (via GitHub)" <gi...@apache.org>.
pfk-beta closed issue #11058: [Bug] tuning cannot send task but tracker - device connection is ok
URL: https://github.com/apache/tvm/issues/11058
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@tvm.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [tvm] pfk-beta commented on issue #11058: [Bug] tuning cannot send task but tracker - device connection is ok
Posted by GitBox <gi...@apache.org>.
pfk-beta commented on issue #11058:
URL: https://github.com/apache/tvm/issues/11058#issuecomment-1102600200
tuning script:
```
import os
import sys
import argparse
import numpy as np
from tcl_scripts.common import load_model
import tvm
from tvm import relay, autotvm
import tvm.relay.testing
from tvm.autotvm.tuner import XGBTuner, GATuner, RandomTuner, GridSearchTuner
from tvm.contrib.utils import tempdir
import tvm.contrib.graph_executor as runtime
from tvm.contrib import ndk
if not sys.warnoptions:
import warnings
warnings.simplefilter("ignore") # Change the filter in this process
os.environ["PYTHONWARNINGS"] = "ignore" # Also affect subprocesses
def create_tuning_log_name(args):
target = args.target.replace(' ', '_')
return f"{args.model_name}__{args.rpc_key}__{target}__{args.n_trials}.log"
def get_task_tuner(task, tuner_name):
if tuner_name == "xgb" or tuner_name == "xgb-rank":
tuner = XGBTuner(task, loss_type="rank")
elif tuner_name == "ga":
tuner = GATuner(task, pop_size=50)
elif tuner_name == "random":
tuner = RandomTuner(task)
elif tuner_name == "gridsearch":
tuner = GridSearchTuner(task)
else:
raise ValueError("Invalid tuner: " + tuner_name)
return tuner
def tune_model(args, mod, params, target):
measure_option = autotvm.measure_option(
builder=autotvm.LocalBuilder(build_func="ndk"),
runner=autotvm.RPCRunner(
args.rpc_key,
host=args.rpc_tracker,
port=args.rpc_port,
number=args.runner_number,
repeat=args.runner_repeat,
timeout=60,
)
)
tasks = autotvm.task.extract_from_program(
mod["main"],
target=target,
params=params,
ops=(relay.op.get("nn.conv2d"),),
) # TODO: parametrize ops which will be tuned
if not tasks:
print("No tasks...")
task_log_filename = None
for i, task in enumerate(reversed(tasks)):
prefix = "[Task %2d/%2d] " % (i + 1, len(tasks))
task_log_filename = args.output_tuning_log + f".tmp"
task_tuner = get_task_tuner(task, "xgb")
if os.path.isfile(task_log_filename):
task_tuner.load_history(
autotvm.record.load_from_file(task_log_filename))
n_trials = min(args.n_trials, len(task.config_space))
task_tuner.tune(
n_trial=n_trials,
early_stopping=args.early_stopping,
measure_option=measure_option,
callbacks=[
autotvm.callback.progress_bar(n_trials, prefix=prefix),
autotvm.callback.log_to_file(task_log_filename),
],
)
if task_log_filename:
# pick best records to a cache file
autotvm.record.pick_best(task_log_filename, args.output_tuning_log)
os.remove(task_log_filename)
def main(args):
target = tvm.target.Target(args.target, host=args.target_host)
mod, params = load_model(args)
tune_model(args, mod, params, target)
if __name__ == "__main__":
parser = argparse.ArgumentParser()
parser.add_argument('--model_name', required=True,
help="How do you name this model? "
"This value is used for generating name, "
"if output_tuning_log is not specified. No spaces.")
parser.add_argument('--input_model', required=True,
help="Fullpath to model")
parser.add_argument('--input_name', required=True,
help="Name of input node")
parser.add_argument('--input_shape', required=True,
help="Shape of input node, coma-separated, no spaces.")
parser.add_argument('--input_dtype',
default="float32", required=True,
help="Dtype of input node")
parser.add_argument('--rpc_tracker',
required=True,
help="IP address of RPC tracker")
parser.add_argument('--rpc_port',
type=int, required=True,
help="IP port of RPC tracker")
parser.add_argument('--rpc_key',
required=True,
help="Key of RPC tracker")
parser.add_argument('--output_tuning_log',
default=None,
help="Where to save tuning output to be used for benchmark.")
parser.add_argument('--runner_number', type=int, default=4,
help="Number of separate benchmark runs")
parser.add_argument('--runner_repeat', type=int, default=3,
help="Number of inference in one run")
parser.add_argument('--n_trials',
type=int, default=10,
help="Number of trials. Must be larnger than 1. Typically 2000...")
parser.add_argument('--early_stopping',
type=int, default=400,
help='Early stopping for tuning. Ignore when no tuning.')
parser.add_argument('--target', default="opencl", help="")
parser.add_argument('--target_host',
default="llvm -mtriple=aarch64-linux-gnu", help="")
args = parser.parse_args()
args.input_shape = eval(args.input_shape)
if not args.output_tuning_log:
args.output_tuning_log = create_tuning_log_name(args)
main(args)
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@tvm.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
Re: [I] [Bug] tuning cannot send task but tracker - device connection is ok [tvm]
Posted by "pfk-beta (via GitHub)" <gi...@apache.org>.
pfk-beta commented on issue #11058:
URL: https://github.com/apache/tvm/issues/11058#issuecomment-1883793631
I'm pretty sure it was my fault, because I didn't know TVM well. I'm closing it, because my current setup is working.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@tvm.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org