You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@tvm.apache.org by mbarrett97 <no...@github.com> on 2019/10/18 16:36:55 UTC

[dmlc/tvm] [RFC] [AutoTVM] Implementing an auto-tuning library/cache (#4150)

Auto-tuning currently relies on manually keeping track of various log files. This can quickly become quite unwieldy when tuning for many different devices, trying to do partial tuning or restarting a tuning session.

Proposals
------------

Create an offline library of auto-tune configurations into which you can feed auto-tuning logs and have the optimal configurations be saved. The library should store not just the configuration, but also the tuning conditions (eg. tuner + no. of trials). This way, it is possible to check whether or not 'sufficient' tuning has already been done on a particular task and if so that task can be skipped. I propose an interface which to the library which would make a typical auto-tuning loop look something like the following:

```
# Initialise a config library object pointing to some index file
# Probably have the default point to something like ~/.tvm/autotvm/...
config_library = ConfigLibrary('path/to/index.json')
tuner = 'xgb'

# Create a new auto-tuning 'job'
# The library will automatically generate a tmp log file for the job
config_library.start_job()

for i, tsk in enumerate(tasks):
    # get_trials returns the number of trials a task has been tuned for already
    trials_pretuned = config_library.get_trials(tsk)
    if trials_pretuned >= early_stopping or trials_pretuned >= len(tsk.config_space):
	logger.info("[Task  {}/{}]  Found in Config Library!".format(i + 1, len(tasks)))
	continue

    # Create a tuner
    tuner_obj = XGBTuner(tsk, loss_type="rank")

    # If transfer learning is being used, load the existing results
    if use_transfer_learning:
        # get_job_records returns the tuning records for the current job
	tuner_obj.load_history(config_library.get_job_records())

    prefix = "[Task %2d/%2d] " % (i + 1, len(tasks))

    # Perform the tuning
    tuner_obj.tune(
	n_trial=min(n_trial, len(tsk.config_space)),
	early_stopping=early_stopping,
	measure_option=measure_option,
	callbacks=[
	    autotvm.callback.progress_bar(n_trial, prefix=prefix),
            # New autotvm callback to log directly to the Config Library
            autotvm.callback.log_to_library(config_library),
	],
    )

config_library.stop_job()
```

You would then use the library with something as simple as:

```
with config_library:
    relay.build(...)
```

Additional Thoughts
------------

In order to reliably interact with existing records in the library, you need to be able to determine the exact platform/device that the tuning was performed on. I currently use the '-model' parameter to store this information (eg. -model=hikey960), but it would be better to be able to store some arbitrary json object here so that additional platform configuration options can be specified (eg. clock speeds, driver versions etc).

The current logging system is also heavily reliant on writing essentially flat text files. A config library would probably be more suited to being stored in a nosql/json database, however for now I've stuck to keeping it flat.

I'll link my WIP PR shortly when it becomes available.

Comments/suggestions are welcomed!

-- 
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub:
https://github.com/dmlc/tvm/issues/4150

Re: [dmlc/tvm] [RFC] [AutoTVM] Implementing an auto-tuning library/cache (#4150)

Posted by mbarrett97 <no...@github.com>.
@comaniac I think I understand where our different approaches are coming from. I was proposing that only the optimal configurations be permanently saved to the config library (like with TopHub) and a temporary log file of tuning configs would be maintained only during a tuning job. Storing all of the tuning history would rapidly result in huge files which I think would be fine in the case of a database but seems unwise for text files (in terms of search performance).

From my experience using AutoTVM, I often find interrupted tuning sessions occur while tuning a large multilayer network. In this case, I mostly care about skipping the layers that have already been fully tuned. Restarting the partially tuned layer from scratch is often not a significant time penalty in comparison. I see that this approach is not nearly as good in a workflow that involves iteratively tuning a network more and more, in which case you would save a significant amount of time by being able to resume using the tuning history.

A compromise between the two options might be, as you said, making the tuners deterministic. That way by just knowing the number of trials we can determine which configs can be skipped without needing to store the entire history. I don't think this can be made to work with the xgb tuner though (maybe just treat that as a special case?)

-- 
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub:
https://github.com/dmlc/tvm/issues/4150#issuecomment-544701150

Re: [dmlc/tvm] [RFC] [AutoTVM] Implementing an auto-tuning library/cache (#4150)

Posted by mbarrett97 <no...@github.com>.
Thanks @kevinthesun and @comaniac for the responses!

> I'd prefer to keep the current AutoTVM dispatch context

I'm not intending to replace the existing dispatch context, only provide some syntactic sugar. We could just override the `__enter__` method of ConfigLibrary to do `apply_history_best`. I think it would be more intuitive than extracting the relevant .json file and passing explicitly.

> If we design this resume logic in a general way, we can also extend it to tophub.

Does it make sense to generalise here? As far as I can tell, TopHub doesn't store tuning history just optimal configs, so there's no way to 'resume' a TopHub tuning session. In some way we have to determine whether the existing 'tuning effort' to produce a particular config is sufficient and number of trials is the only obvious way I can think of characterising this. I'd be happy to look at any alternative implementation idea though.

> We can try to retrieve the target device info using system call and add it to every record when dumping to file/database.

This would be a good start, but I think this needs to also be something a user can fully specify. For instance, we might be interesting in driver versions, memory clock speeds or even physical parameters such as board cooling. Which system calls were you considering using to determine the platform? Perhaps have a default method that relies on these calls with the ability to pass additional arbitrary info to `config_library.start_job()`?

> In my personal opinion, we also need to invalid the history/records when TVM has been updated

I agree with this, but maybe it can be included as part of the previous point on board configuration? In a general sense we need an idea of whether a particular config is 'compatible' with our current platform and I think it's reasonable to include TVM version as a part of this.


-- 
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub:
https://github.com/dmlc/tvm/issues/4150#issuecomment-544433748

Re: [dmlc/tvm] [RFC] [AutoTVM] Implementing an auto-tuning library/cache (#4150)

Posted by Yao Wang <no...@github.com>.
For local log file management, how about we store the best K schedules for each workload? User can choose how many schedules they would like to keep.

-- 
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub:
https://github.com/dmlc/tvm/issues/4150#issuecomment-545076379

Re: [dmlc/tvm] [RFC] [AutoTVM] Implementing an auto-tuning library/cache (#4150)

Posted by Cody Hao Yu <no...@github.com>.
Thanks for the RFC. I like the idea of the config library concept. Some concerns/questions:

- Same to @kevinthesun, I'd prefer to keep the current AutoTVM dispatch context instead of introducing a new one. For example, we can just overload `apply_history_best` to take either a JSON file like now or a config library in this proposal.

- Current AutoTVM has a database module that stores tuned configs and results to Redis database (altough no one is using this feature AFAIK). Considering the compatibility and usability, we should hide the resume logic from users and make all storage mechanisms consist. For example, we can first move the "log to" callback to tuner arguments:

```
    tuner_obj.tune(
	n_trial=min(n_trial, len(tsk.config_space)),
	early_stopping=early_stopping,
	measure_option=measure_option,
        config_library="history.json", # Can be either a string (JSON file name) or a DB object
	callbacks=[
	    autotvm.callback.progress_bar(n_trial, prefix=prefix),
	],
    )
```

In this way, we can implement the resume logic you proposed in the constructor of tuners.

- For the rules that determine if the history can be reused for resuming the tuning, I think it should be a general question as the tophub faces to. However, I would not suggest to refer to the tuning specific information such as trial numbers for the following reasons. First, users may require different trial numbers or different measure options. Second, this limits the history be used only for the exactly same task. If we design this resume logic in a general way, we can also extend it to tophub.

In my personal opinion, when loading the history, in addition to checking if the history config matches the current task in terms of target, op name, shapes and attributes, we also need to check the device as you mentioned. We can try to retrieve the target device info using system call and add it to every record when dumping to file/database.
 
Besides, we also need to invalid the history/records when TVM has been updated, because it may result in different performance even for the same config on the same device. The simplest way is checking the timestamp. For example, we let users config an expiration time when creating a tuner:

```
    library_option = {"library": "history.json", "expiration": 1440}

    tuner_obj.tune(
	n_trial=min(n_trial, len(tsk.config_space)),
	early_stopping=early_stopping,
	measure_option=measure_option,
        library_option=library_option, 
	callbacks=[
	    autotvm.callback.progress_bar(n_trial, prefix=prefix),
	],
    )
```

In this example, a user wants the tuner to load `history.json` and use all records generated within 1440 minutes (1 day). One advantage of using this approach is that we don't need to touch the searching part at all, but only let `Tuner.tune` bypass the measurement when the record is available already.

Again thanks for the proposal, and any of my suggestions can be adjusted/discussed if they are too overwhelming or unnecessary.

-- 
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub:
https://github.com/dmlc/tvm/issues/4150#issuecomment-543970017

Re: [dmlc/tvm] [RFC] [AutoTVM] Implementing an auto-tuning library/cache (#4150)

Posted by Cody Hao Yu <no...@github.com>.
@mbarrett97 I see your point. If the problem is narrowed down to "skip some tasks in a model when resuming the tuning that was accidently interrupted", then your proposal is a lightweight working solution. Maybe we can file another RFC focusing on a more general history reuse support.

Then talking back to your proposal, the current solution is using `config_library` in the `log_to_file` callback so that it will store configs as well as the trial number for each layer (task). According to your reply, are you going to store the best config for each layer only? I didn't see the corresponding implementation in your PR, tho (please correct me if I missed it). If config library only stores the best one for the sake of tuning performance and disk space, how do I store all configs like now if I prefer? In addition, I don't think storing all explored configs is a problem. I stored the whole history all the time for the research purpose and didn't feel any performance problem. For the disk space, my `history.json` for 2000 records is about 1.3M. Taking mobilenet v2 for instance, it has 31 tasks in total, meaning 40.3M history file. I think this not a big problem for the modern disk.

-- 
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub:
https://github.com/dmlc/tvm/issues/4150#issuecomment-544713918

Re: [dmlc/tvm] [RFC] [AutoTVM] Implementing an auto-tuning library/cache (#4150)

Posted by Yao Wang <no...@github.com>.
Thank you for this proposal. This is helpful to manage local log files. One question about:
```python
with config_library:
    relay.build(...)
```
What is the relationship between config_library and autotvm dispatch context? It seems that this design replaces dispatch context with config_library. And how are different dispatch contexts are managed in this case?

-- 
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub:
https://github.com/dmlc/tvm/issues/4150#issuecomment-543847139

Re: [dmlc/tvm] [RFC] [AutoTVM] Implementing an auto-tuning library/cache (#4150)

Posted by mbarrett97 <no...@github.com>.
@icemelon9 This suggestion is more about infrastructure so that we're not required to keep track of individual log files and how they were produced. We need this to decide whether or not we can skip a task based on existing results.

@comaniac @kevinthesun I've updated the PR to include more concretely the ideas being discussed. I think an auto-tuning 'job' is distinct from a task as I am using it to refer to a series of tasks tuned sequentially (eg. tuning a network would be a 'job'). A JSON file containing all of the jobs is produced which contains information such as the start/finish time of the job, target/platform parameters and importantly the optimal configs for each task in the job. In principle this would allow you to 'revert' an auto-tuning job from the config library if you discovered you'd done something invalid during a job (I've done this a few times...) Keeping the entire history of a job can be controlled by a flag.

I'm hacking one of the tutorial scripts to use the config library mechanism instead, `tune_with_config_library.py`. For convenience, here's the current tuning loop:

```
def tune_kernels(tasks,
                 config_library,
                 measure_option,
                 tuner='gridsearch',
                 early_stopping=None,
                 log_filename='tuning.log'):

    with config_library.tune(target):
        for i, tsk in enumerate(tasks):
            prefix = "[Task %2d/%2d] " % (i+1, len(tasks))

            # converting conv2d tasks to conv2d_NCHWc tasks
            op_name = tsk.workload[0]
            if op_name == 'conv2d':
                func_create = 'topi_x86_conv2d_NCHWc'
            elif op_name == 'depthwise_conv2d_nchw':
                func_create = 'topi_x86_depthwise_conv2d_NCHWc_from_nchw'
            else:
                raise ValueError("Tuning {} is not supported on x86".format(op_name))

            task = autotvm.task.create(func_create, args=tsk.args,
                                       target=target, template_key='direct')
            task.workload = tsk.workload

            # create tuner
            if tuner == 'xgb' or tuner == 'xgb-rank':
                tuner_obj = XGBTuner(task, loss_type='rank')
            elif tuner == 'ga':
                tuner_obj = GATuner(task, pop_size=50)
            elif tuner == 'random':
                tuner_obj = RandomTuner(task)
            elif tuner == 'gridsearch':
                tuner_obj = GridSearchTuner(task)
            else:
                raise ValueError("Invalid tuner: " + tuner)

            # do tuning
            n_trial=10
            tuner_obj.tune(
                n_trial=n_trial,
                early_stopping=early_stopping,
                measure_option=measure_option,
                config_library=config_library,
                callbacks=[autotvm.callback.progress_bar(n_trial, prefix=prefix)],
            )
```

-- 
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub:
https://github.com/dmlc/tvm/issues/4150#issuecomment-546962151

Re: [dmlc/tvm] [RFC] [AutoTVM] Implementing an auto-tuning library/cache (#4150)

Posted by mbarrett97 <no...@github.com>.
@comaniac Having given this some thought, I think it's reasonable to support both approaches. I didn't want to include full logs because I was hoping to also be able to use config library to distribute tuned configs, however it should be fine to just 'export' a config library with only optimal configs.

In that case, I propose the following. Have each auto-tuning session create a new 'job'. This job will have an entry in a JSON file ('job index') containing at least the target string, start/finish time of the job and a path to the history file generated. Optionally we permit some arbitrary JSON to describe the platform in more detail. By default, we delete the history file when a job completes (but keep the job entry in the index), however a flag can be passed to retain the history.

Now if a task needs to be resume, first a simple check can be done to see if the existing optimal config has already been tuned with sufficiently many trials (and with the right tuner/platform). If so, skip, otherwise search the job index to see if any history files qualify to restart the tuning. In that case, we can use your proposal.


-- 
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub:
https://github.com/dmlc/tvm/issues/4150#issuecomment-545041036

Re: [dmlc/tvm] [RFC] [AutoTVM] Implementing an auto-tuning library/cache (#4150)

Posted by Cody Hao Yu <no...@github.com>.
Some comments after reading the example and the current PR.

* The APIs are still confusing to me. I agree with the `job` part but not others.
`config_library` still doesn't look like a "library". It's more like a job manager according to your proposal. The use case `config_library.tune()` is also weird, because we already use `tuner.tune` for each task. In my personal opinion, something like `job_manager.session()` would be more reasonable.

* I didn't see the features you claimed neither in the PR or the example. Specifically, how to control if I want to record all configs or just the best one? how to resume a job? I don't think the current PR will skip any well-tuned job when resuming. I would suggest making that example more concrete and realistic first before the implementation so that we can all refer to it. I also think it's fine to create a separate tutorial for this feature. In summary, here are the points I wish to see in the tutorial:

* How to create a config library and what's the controllable options.
* How a config library resumes a job.
* How to specify the log file mode (all or the best).
* How to apply the log file generated by the config library to the rest building process.


-- 
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub:
https://github.com/dmlc/tvm/issues/4150#issuecomment-547252660

Re: [dmlc/tvm] [RFC] [AutoTVM] Implementing an auto-tuning library/cache (#4150)

Posted by Haichen Shen <no...@github.com>.
@mbarrett97 I wonder why not just using the transfer learning in the AutoTVM. After using transfer learning, AutoTVM will skip the tasks that have been tried before. See the example at
https://docs.tvm.ai/tutorials/autotvm/tune_relay_arm.html#begin-tuning

-- 
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub:
https://github.com/dmlc/tvm/issues/4150#issuecomment-546006385

Re: [dmlc/tvm] [RFC] [AutoTVM] Implementing an auto-tuning library/cache (#4150)

Posted by Cody Hao Yu <no...@github.com>.
> @comaniac Having given this some thought, I think it's reasonable to support both approaches. I didn't want to include full logs because I was hoping to also be able to use config library to distribute tuned configs, however it should be fine to just 'export' a config library with only optimal configs.
> 
Got your point, altough I think you can always pick the best config before distribution like the current AutoTVM use case. Current AutoTVM log all configs to a JSON file, and if a user only wants to keep the best one, she uses `autotvm.record.pick_best` to generate another small JSON that only contains the best config for each layer (task).

> In that case, I propose the following. Have each auto-tuning session create a new 'job'. This job will have an entry in a JSON file ('job index') containing at least the target string, start/finish time of the job and a path to the history file generated. Optionally we permit some arbitrary JSON to describe the platform in more detail. By default, we delete the history file when a job completes (but keep the job entry in the index), however a flag can be passed to retain the history.
> 
If I undertand correctly, you are going to add tuning process metadata to the JSON file in addition to the configs, like the example code snippet you proposed in the very beginning of this RFC. Since you propose to use config library as the "database" to log all configs (the argument of `log_to_file` callback), you have to make sure the current `history.json` is still always available with explored configs whenever the task is interrupted. Maybe we can store two JSON files (e.g., task.json and history.json) to make a clear separation?

Another suggestion is naming. `ConfigLibrary` seems not accurate in this case. To me, it's more like a `TaskContext` or `TaskSession`. Also we should use task instead of job to be consistent.

> Now if a task needs to be resume, first a simple check can be done to see if the existing optimal config has already been tuned with sufficiently many trials (and with the right tuner/platform). If so, skip, otherwise search the job index to see if any history files qualify to restart the tuning. In that case, we can use your proposal.

Yeah I think this part is relatively clear.


-- 
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub:
https://github.com/dmlc/tvm/issues/4150#issuecomment-545097314

Re: [dmlc/tvm] [RFC] [AutoTVM] Implementing an auto-tuning library/cache (#4150)

Posted by Tianqi Chen <no...@github.com>.
Thanks for the helpful discussion. Some of the common themes that I see

- Need for more meta data to inform the tuner, if possible
- The key question is whether meta data is mandatory or serve as an auxiliary component.
   -  e.g. we may not want the general features to must depend on the meta-data, while it is nice to have an option to start from trial n, but it would be great if the tuner can still function without them 
- Everyone seems to agree on a library context that help.

It would be great if we can dissect the discussion, e.g. reach a consensus for meta-data format that we prefer, and then talks about possible context library behaviors and possibility of implementing different variants of libraries


-- 
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub:
https://github.com/dmlc/tvm/issues/4150#issuecomment-544614703

Re: [dmlc/tvm] [RFC] [AutoTVM] Implementing an auto-tuning library/cache (#4150)

Posted by Cody Hao Yu <no...@github.com>.
Thanks for the reponses and I think they are valuable. I embedded my opinions with yours and leave the dispatch context for @kevinthesun.

Also cc @tqchen and @icemelon9 for their inputs.

> > If we design this resume logic in a general way, we can also extend it to tophub.
> 
> Does it make sense to generalise here? As far as I can tell, TopHub doesn't store tuning history just optimal configs, so there's no way to 'resume' a TopHub tuning session. In some way we have to determine whether the existing 'tuning effort' to produce a particular config is sufficient and number of trials is the only obvious way I can think of characterising this. I'd be happy to look at any alternative implementation idea though.

I agree with you that TopHub is serving a different purpose if we consider trial number in the resume logic, but they can still share the same implementation and history format in the way I suggested. My concern of using trial number is that it limits the use case of this RFC only for resuming interrupted tuning but not others, such as transferring the tuning process to the others, or reuse the configs of 2000 trial random search to launch a new grid search, etc.

Alternativaly, we could decouple the history and a specific tuning process. Speicifically, we do not add any tuning process specific information to the config library but just let the tuner determine if it can reuse the result from the config library or not when it needs to measure that config. For example, the tuning process was interrupted in the 50th trial so we have 50 configs in the library. When resuming the tuning, the tuner still starts from scratch but it could save the time of measuring those 50 configs when the tuner follows the same tuning process. One advantage is that this scenario is applicable to different tuner or even different models with the same task.

One drawback of my alternative comapred to yours is that if the tuning process is non-deterministic (e.g., random search) then we might spend time on tuning different configs, but I think this can be workaround by either exposing an optional random seed argument in tuner (such as `random_state` used in `sklearn`), or let user reduce the trial number when resuming.

> 
> > We can try to retrieve the target device info using system call and add it to every record when dumping to file/database.
> 
> This would be a good start, but I think this needs to also be something a user can fully specify. For instance, we might be interesting in driver versions, memory clock speeds or even physical parameters such as board cooling. Which system calls were you considering using to determine the platform? Perhaps have a default method that relies on these calls with the ability to pass additional arbitrary info to `config_library.start_job()`?
> 

I have the same question actually. This part is relatively vague and probably need some other's input.

> > In my personal opinion, we also need to invalid the history/records when TVM has been updated
> 
> I agree with this, but maybe it can be included as part of the previous point on board configuration? In a general sense we need an idea of whether a particular config is 'compatible' with our current platform and I think it's reasonable to include TVM version as a part of this.

Your response remineded me that the current config history already includes a version information, although it is always 0.1. Not sure if we can make use of it and save some efforts.


-- 
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub:
https://github.com/dmlc/tvm/issues/4150#issuecomment-544611522