You are viewing a plain text version of this content. The canonical link for it is here.

Posted to commits@tvm.apache.org by GitBox <gi...@apache.org> on 2022/02/18 11:55:32 UTC

[GitHub] [tvm-rfcs] noobotdj opened a new pull request #57: [RFC]FamilySeer: A new search method for Auto-scheduler

noobotdj opened a new pull request #57:
URL: https://github.com/apache/tvm-rfcs/pull/57


   RFC topic in forum: https://discuss.tvm.apache.org/t/rfc-familyseer-a-new-search-method-for-auto-scheduler/11877
   
   @comaniac @junrushao1994


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@tvm.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [tvm-rfcs] comaniac commented on pull request #57: [RFC]FamilySeer: A new search method for Auto-scheduler

Posted by GitBox <gi...@apache.org>.

comaniac commented on pull request #57:
URL: https://github.com/apache/tvm-rfcs/pull/57#issuecomment-1045334149


   Thanks for the RFC. Will review next week.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@tvm.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [tvm-rfcs] comaniac commented on a change in pull request #57: [RFC]FamilySeer: A new search method for Auto-scheduler

Posted by GitBox <gi...@apache.org>.

comaniac commented on a change in pull request #57:
URL: https://github.com/apache/tvm-rfcs/pull/57#discussion_r812308466



##########
File path: rfcs/0057-FamilySeer.md
##########
@@ -0,0 +1,212 @@
+- Feature Name: (FamilySeer: A new search method for Auto-scheduler)
+- Start Date: (2021-01-07)
+- RFC PR: [apache/tvm-rfcs#57](https://github.com/apache/tvm-rfcs/pull/57)
+- GitHub Issue: [apache/tvm#9875](https://github.com/apache/tvm/pull/9875)
+
+# Summary
+[summary]: #summary
+
+We propose FamilySeer, a new search method that optimizes search efficiency and quality of the Auto-scheduler. We introduce several features:
+
+- FamilySeer exploits the subgraph similarity to form a collection of subgraph families and constructs cost models at subgraph family basis to improve cost model accuracy.
+- We enable subgraphs within each family to share the search results within each tuning iteration, avoiding costly code measurements on real hardware and thus accelerating the search process to converge to optimal results.
+- We also make some general optimizations like enabling parallel measurement on single node with multiple GPUs and training the cost model on GPU.
+
+# Motivation
+[motivation]: #motivation
+
+Auto-scheduler (Ansor) uses code sketch and optimization rules to generate a large search space. The search space defined by Ansor has shown great opportunities and therefore the search quality and the search efficiency are determined by how we search the space.
+
+Ansor utilizes improved cost model and task scheduler to help explore the search space. The cost model analyzes and finds high-performance code transformations in the search space and the task scheduler allocates the time budget to different computation graphs. However, we find serval drawbacks to this approach:
+
+The accuracy of the cost model determines the search quality, but Ansor uses monolithic cost model to predict different computation graphs (subgraphs), resulting in an accuracy loss during tuning.
+
+The task scheduler allocates most of the time budget to subgraphs with most improving potential (i.e., those with the highest latency). This approach works well at the beginning of the autotuning. However, as the potential subgraph gradually reaches its peak performance with adequate time budget, other subgraphs have little time budget to reach its peak performance.
+
+The search process will at the end take a dozen of hours. This motivates us to find better way to explore the search space.
+
+# Guide-level explanation
+[guide-level-explanation]: #guide-level-explanation
+
+We integrate our search method into Auto-Scheduler. Therefore, users only need to change some of the parameters to enable our search method.
+
+We use the code below in [Auto-scheduling a Neural Network for NVIDIA GPU](https://tvm.apache.org/docs/how_to/tune_with_autoscheduler/tune_network_cuda.html#begin-tuning) as an example:
+
+```python
+#...
+
+# load all task and into tuner
+tuner = auto_scheduler.TaskScheduler(tasks, task_weights)
+
+#define tuning option for tuner
+tune_option = auto_scheduler.TuningOptions(
+    num_measure_trials=200,  # change this to 20000 to achieve the best performance
+    runner=measure_ctx.runner,
+    measure_callbacks=[auto_scheduler.RecordToFile(log_file)],
+)
+
+# start tuning
+#tuner.tune(tune_option) #add new parameter to tune function 
+tuner.tune(tune_option,search_policy="sketch.xgb.family_op")

Review comment:
       The policy name could be more informative. `family_op` is not a common term after all.

##########
File path: rfcs/0057-FamilySeer.md
##########
@@ -0,0 +1,212 @@
+- Feature Name: (FamilySeer: A new search method for Auto-scheduler)
+- Start Date: (2021-01-07)
+- RFC PR: [apache/tvm-rfcs#57](https://github.com/apache/tvm-rfcs/pull/57)
+- GitHub Issue: [apache/tvm#9875](https://github.com/apache/tvm/pull/9875)
+
+# Summary
+[summary]: #summary
+
+We propose FamilySeer, a new search method that optimizes search efficiency and quality of the Auto-scheduler. We introduce several features:
+
+- FamilySeer exploits the subgraph similarity to form a collection of subgraph families and constructs cost models at subgraph family basis to improve cost model accuracy.
+- We enable subgraphs within each family to share the search results within each tuning iteration, avoiding costly code measurements on real hardware and thus accelerating the search process to converge to optimal results.
+- We also make some general optimizations like enabling parallel measurement on single node with multiple GPUs and training the cost model on GPU.
+
+# Motivation
+[motivation]: #motivation
+
+Auto-scheduler (Ansor) uses code sketch and optimization rules to generate a large search space. The search space defined by Ansor has shown great opportunities and therefore the search quality and the search efficiency are determined by how we search the space.
+
+Ansor utilizes improved cost model and task scheduler to help explore the search space. The cost model analyzes and finds high-performance code transformations in the search space and the task scheduler allocates the time budget to different computation graphs. However, we find serval drawbacks to this approach:
+
+The accuracy of the cost model determines the search quality, but Ansor uses monolithic cost model to predict different computation graphs (subgraphs), resulting in an accuracy loss during tuning.
+
+The task scheduler allocates most of the time budget to subgraphs with most improving potential (i.e., those with the highest latency). This approach works well at the beginning of the autotuning. However, as the potential subgraph gradually reaches its peak performance with adequate time budget, other subgraphs have little time budget to reach its peak performance.
+
+The search process will at the end take a dozen of hours. This motivates us to find better way to explore the search space.
+
+# Guide-level explanation
+[guide-level-explanation]: #guide-level-explanation
+
+We integrate our search method into Auto-Scheduler. Therefore, users only need to change some of the parameters to enable our search method.
+
+We use the code below in [Auto-scheduling a Neural Network for NVIDIA GPU](https://tvm.apache.org/docs/how_to/tune_with_autoscheduler/tune_network_cuda.html#begin-tuning) as an example:
+
+```python
+#...
+
+# load all task and into tuner
+tuner = auto_scheduler.TaskScheduler(tasks, task_weights)
+
+#define tuning option for tuner
+tune_option = auto_scheduler.TuningOptions(
+    num_measure_trials=200,  # change this to 20000 to achieve the best performance
+    runner=measure_ctx.runner,
+    measure_callbacks=[auto_scheduler.RecordToFile(log_file)],
+)
+
+# start tuning
+#tuner.tune(tune_option) #add new parameter to tune function 
+tuner.tune(tune_option,search_policy="sketch.xgb.family_op")
+
+```
+
+The `tuner` loads the `tune_option` into the `tune` function. There are several parameters in the `tune` function (Refer to class [Taskscheduler](https://tvm.apache.org/docs/reference/api/python/auto_scheduler.html?highlight=taskscheduler#tvm.auto_scheduler.TaskScheduler)). Users can enable our method by changing the `search_policy` parameter to `sketch.xgb.family_<family_algorithm>`. We currently provide two family algorithms as an option: `op` refers to classifying subgraphs based on the core operation, and `hash` refers to classifying subgraphs based on operation sequence. We recommend using `op` to achieve better performance.
+
+# Reference-level explanation
+[reference-level-explanation]: #reference-level-explanation
+
+Our search method consists of three steps:
+
+1. Identifying similar subgraphs
+```python
+def make_family_group(
+  tasks,
+  search_policy,
+):
+  """identify each subgraphs and group them into subgraph family.
+  """
+  if search_policy == "default":
+      search_policy = "sketch.xgb"
+
+  if isinstance(search_policy, str):
+      policy = search_policy.split(".")
+      if len(policy) == 2:
+          return {}
+      elif len(policy) == 3:
+          _, _, model_group = policy
+          _, class_type = model_group.split("_")
+      else:
+          raise ValueError("Invalid search policy: " + search_policy)
+      
+  family_group = {}
+  if class_type == "op":
+      for idx, task in enumerate(tasks):
+          task_layers = task.desc.split('_')
+          if task_layers[1] not in family_group:
+              family_group[task_layers[1]] = []
+              family_group[task_layers[1]].append(idx)
+          else:
+              family_group[task_layers[1]].append(idx)
+
+  elif class_type == "hash":
+      for idx, task in enumerate(tasks):
+          first = task.workload_key.find("[\"") + 2
+          end = task.workload_key.find("\",")
+          task_hash = task.workload_key[first:end]
+          if task_hash not in family_group:
+              family_group[task_hash] = []
+              family_group[task_hash].append(idx)
+          else:
+              family_group[task_hash].append(idx)
+
+  elif class_type == "ind":

Review comment:
       Is `ind` the third version? I didn't find it in the previous section.

##########
File path: rfcs/0057-FamilySeer.md
##########
@@ -0,0 +1,212 @@
+- Feature Name: (FamilySeer: A new search method for Auto-scheduler)
+- Start Date: (2021-01-07)
+- RFC PR: [apache/tvm-rfcs#57](https://github.com/apache/tvm-rfcs/pull/57)
+- GitHub Issue: [apache/tvm#9875](https://github.com/apache/tvm/pull/9875)
+
+# Summary
+[summary]: #summary
+
+We propose FamilySeer, a new search method that optimizes search efficiency and quality of the Auto-scheduler. We introduce several features:
+
+- FamilySeer exploits the subgraph similarity to form a collection of subgraph families and constructs cost models at subgraph family basis to improve cost model accuracy.
+- We enable subgraphs within each family to share the search results within each tuning iteration, avoiding costly code measurements on real hardware and thus accelerating the search process to converge to optimal results.
+- We also make some general optimizations like enabling parallel measurement on single node with multiple GPUs and training the cost model on GPU.
+
+# Motivation
+[motivation]: #motivation
+
+Auto-scheduler (Ansor) uses code sketch and optimization rules to generate a large search space. The search space defined by Ansor has shown great opportunities and therefore the search quality and the search efficiency are determined by how we search the space.
+
+Ansor utilizes improved cost model and task scheduler to help explore the search space. The cost model analyzes and finds high-performance code transformations in the search space and the task scheduler allocates the time budget to different computation graphs. However, we find serval drawbacks to this approach:
+
+The accuracy of the cost model determines the search quality, but Ansor uses monolithic cost model to predict different computation graphs (subgraphs), resulting in an accuracy loss during tuning.
+
+The task scheduler allocates most of the time budget to subgraphs with most improving potential (i.e., those with the highest latency). This approach works well at the beginning of the autotuning. However, as the potential subgraph gradually reaches its peak performance with adequate time budget, other subgraphs have little time budget to reach its peak performance.
+
+The search process will at the end take a dozen of hours. This motivates us to find better way to explore the search space.
+
+# Guide-level explanation
+[guide-level-explanation]: #guide-level-explanation
+
+We integrate our search method into Auto-Scheduler. Therefore, users only need to change some of the parameters to enable our search method.
+
+We use the code below in [Auto-scheduling a Neural Network for NVIDIA GPU](https://tvm.apache.org/docs/how_to/tune_with_autoscheduler/tune_network_cuda.html#begin-tuning) as an example:
+
+```python
+#...
+
+# load all task and into tuner
+tuner = auto_scheduler.TaskScheduler(tasks, task_weights)
+
+#define tuning option for tuner
+tune_option = auto_scheduler.TuningOptions(
+    num_measure_trials=200,  # change this to 20000 to achieve the best performance
+    runner=measure_ctx.runner,
+    measure_callbacks=[auto_scheduler.RecordToFile(log_file)],
+)
+
+# start tuning
+#tuner.tune(tune_option) #add new parameter to tune function 
+tuner.tune(tune_option,search_policy="sketch.xgb.family_op")
+
+```
+
+The `tuner` loads the `tune_option` into the `tune` function. There are several parameters in the `tune` function (Refer to class [Taskscheduler](https://tvm.apache.org/docs/reference/api/python/auto_scheduler.html?highlight=taskscheduler#tvm.auto_scheduler.TaskScheduler)). Users can enable our method by changing the `search_policy` parameter to `sketch.xgb.family_<family_algorithm>`. We currently provide two family algorithms as an option: `op` refers to classifying subgraphs based on the core operation, and `hash` refers to classifying subgraphs based on operation sequence. We recommend using `op` to achieve better performance.
+
+# Reference-level explanation
+[reference-level-explanation]: #reference-level-explanation
+
+Our search method consists of three steps:
+
+1. Identifying similar subgraphs
+```python
+def make_family_group(
+  tasks,
+  search_policy,
+):
+  """identify each subgraphs and group them into subgraph family.
+  """
+  if search_policy == "default":
+      search_policy = "sketch.xgb"
+
+  if isinstance(search_policy, str):
+      policy = search_policy.split(".")
+      if len(policy) == 2:
+          return {}
+      elif len(policy) == 3:
+          _, _, model_group = policy
+          _, class_type = model_group.split("_")
+      else:
+          raise ValueError("Invalid search policy: " + search_policy)
+      
+  family_group = {}
+  if class_type == "op":
+      for idx, task in enumerate(tasks):
+          task_layers = task.desc.split('_')
+          if task_layers[1] not in family_group:
+              family_group[task_layers[1]] = []
+              family_group[task_layers[1]].append(idx)
+          else:
+              family_group[task_layers[1]].append(idx)
+
+  elif class_type == "hash":
+      for idx, task in enumerate(tasks):
+          first = task.workload_key.find("[\"") + 2
+          end = task.workload_key.find("\",")
+          task_hash = task.workload_key[first:end]
+          if task_hash not in family_group:
+              family_group[task_hash] = []
+              family_group[task_hash].append(idx)
+          else:
+              family_group[task_hash].append(idx)
+
+  elif class_type == "ind":
+      for idx, task in enumerate(tasks):
+          if task.workload_key not in family_group:
+              family_group[task.workload_key] = []
+              family_group[task.workload_key].append(idx)
+          else:
+              family_group[task.workload_key].append(idx)
+      
+  if family_group is not None:
+      for key, value in family_group.items():
+          print("family group :", key, "---", value)
+
+  return family_group
+```
+
+We use static analyzing to classify the subgraphs(tasks) based on their attributes. 
+
+2. Constructing family cost model
+```python
+elif "family" in model_group:
+  # generate cost model for each family
+  cost_model_pool = []
+  for _,group in family_group.items():
+    if model_type == "xgb":
+      cost_model = XGBModel(
+          num_warmup_sample=len(group) * num_measures_per_round,
+          model_file=load_model_file,
+          adapative_training=adapative_training,
+      )
+      if load_model_file and os.path.isfile(load_model_file):
+        logger.info("TaskScheduler: Load pretrained model...")
+        cost_model.load(load_model_file)
+      elif load_log_file:
+        logger.info("TaskScheduler: Reload measured states and train the model...")
+        cost_model.update_from_file(load_log_file)
+    elif model_type == "random":
+        cost_model = RandomModel()
+    else:
+        raise ValueError("Invalid search policy: " + search_policy)
+    cost_model_pool.append(cost_model)
+  
+  #bind each subgraph(task) with its family cost model
+  search_policies = []
+  for task_idx,task in enumerate(tasks):
+    for group_idx,group in enumerate(family_group.values()):
+        if task_idx in group:
+          search_policies.append(
+            SketchPolicy(
+                task,
+                cost_model_pool[group_idx],
+                params=search_policy_params,
+                verbose=verbose,
+                init_search_callbacks=init_search_callbacks,
+            )
+          )
+
+```
+
+After identifying similar subgraphs, We return `family_group` (a list of subgraph families) and build a cost model for each subgraph family.
+
+3.Foresee tuning
+
+```python
+def _tune_family_task(self, task_idx_groups,skip_measures_per_round):
+  """Tune the select family task for one round.
+  """
+  for task_idx in task_idx_groups:
+    # Run pre-tune callbacks
+    for callback in self.callbacks:
+      callback.pre_tune(self, task_idx)
+
+    measure_inputs, measure_results = self.search_policies[task_idx].continue_search_one_round(
+      skip_measures_per_round, self.measurer
+    )
+
+    ……
+```
+
+The foresee tuning takes `task_idx_groups` (A list of subgraph families) and `skip_measures_per_round` as inputs and tunes all the subgraphs inside the list. 
+
+# Drawbacks
+[drawbacks]: #drawbacks
+
+When searching on a larger search space (such as larger batch size), FamilySeer performs similarly or sometimes worse than Auto-scheduler. This is because a larger search space requires more time before the cost model can provide an accurate prediction. Deploying an inaccurate cost model on Foresee tuning may result in spending time budget on non-improvement code transformations.
+
+# Rationale and alternatives
+[rationale-and-alternatives]: #rationale-and-alternatives
+
+Auto-Scheduler generates a large enough search space, so searching the space with efficiency is important. With FamilySeer, Users can search for the same optimal code under less time budget. We hope that our search method can be an alternative option for those who expect to obtain better optimal code under a limited time budget.
+
+# Prior art
+[prior-art]: #prior-art
+
+Please refer to [this paper](https://arxiv.org/abs/2201.00194).

Review comment:
       This is not a "prior" art lol
   You should just put this link to the summary section. And for this section, you should put other related work (e.g., Ansor, FlexTensor, AutoTVM, etc).

##########
File path: rfcs/0057-FamilySeer.md
##########
@@ -0,0 +1,212 @@
+- Feature Name: (FamilySeer: A new search method for Auto-scheduler)
+- Start Date: (2021-01-07)
+- RFC PR: [apache/tvm-rfcs#57](https://github.com/apache/tvm-rfcs/pull/57)
+- GitHub Issue: [apache/tvm#9875](https://github.com/apache/tvm/pull/9875)
+
+# Summary
+[summary]: #summary
+
+We propose FamilySeer, a new search method that optimizes search efficiency and quality of the Auto-scheduler. We introduce several features:
+
+- FamilySeer exploits the subgraph similarity to form a collection of subgraph families and constructs cost models at subgraph family basis to improve cost model accuracy.
+- We enable subgraphs within each family to share the search results within each tuning iteration, avoiding costly code measurements on real hardware and thus accelerating the search process to converge to optimal results.
+- We also make some general optimizations like enabling parallel measurement on single node with multiple GPUs and training the cost model on GPU.
+
+# Motivation
+[motivation]: #motivation
+
+Auto-scheduler (Ansor) uses code sketch and optimization rules to generate a large search space. The search space defined by Ansor has shown great opportunities and therefore the search quality and the search efficiency are determined by how we search the space.
+
+Ansor utilizes improved cost model and task scheduler to help explore the search space. The cost model analyzes and finds high-performance code transformations in the search space and the task scheduler allocates the time budget to different computation graphs. However, we find serval drawbacks to this approach:
+
+The accuracy of the cost model determines the search quality, but Ansor uses monolithic cost model to predict different computation graphs (subgraphs), resulting in an accuracy loss during tuning.
+
+The task scheduler allocates most of the time budget to subgraphs with most improving potential (i.e., those with the highest latency). This approach works well at the beginning of the autotuning. However, as the potential subgraph gradually reaches its peak performance with adequate time budget, other subgraphs have little time budget to reach its peak performance.
+
+The search process will at the end take a dozen of hours. This motivates us to find better way to explore the search space.
+
+# Guide-level explanation
+[guide-level-explanation]: #guide-level-explanation
+
+We integrate our search method into Auto-Scheduler. Therefore, users only need to change some of the parameters to enable our search method.
+
+We use the code below in [Auto-scheduling a Neural Network for NVIDIA GPU](https://tvm.apache.org/docs/how_to/tune_with_autoscheduler/tune_network_cuda.html#begin-tuning) as an example:
+
+```python
+#...
+
+# load all task and into tuner
+tuner = auto_scheduler.TaskScheduler(tasks, task_weights)
+
+#define tuning option for tuner
+tune_option = auto_scheduler.TuningOptions(
+    num_measure_trials=200,  # change this to 20000 to achieve the best performance
+    runner=measure_ctx.runner,
+    measure_callbacks=[auto_scheduler.RecordToFile(log_file)],
+)
+
+# start tuning
+#tuner.tune(tune_option) #add new parameter to tune function 
+tuner.tune(tune_option,search_policy="sketch.xgb.family_op")
+
+```
+
+The `tuner` loads the `tune_option` into the `tune` function. There are several parameters in the `tune` function (Refer to class [Taskscheduler](https://tvm.apache.org/docs/reference/api/python/auto_scheduler.html?highlight=taskscheduler#tvm.auto_scheduler.TaskScheduler)). Users can enable our method by changing the `search_policy` parameter to `sketch.xgb.family_<family_algorithm>`. We currently provide two family algorithms as an option: `op` refers to classifying subgraphs based on the core operation, and `hash` refers to classifying subgraphs based on operation sequence. We recommend using `op` to achieve better performance.
+
+# Reference-level explanation
+[reference-level-explanation]: #reference-level-explanation
+
+Our search method consists of three steps:
+
+1. Identifying similar subgraphs
+```python
+def make_family_group(
+  tasks,
+  search_policy,
+):
+  """identify each subgraphs and group them into subgraph family.
+  """
+  if search_policy == "default":
+      search_policy = "sketch.xgb"
+
+  if isinstance(search_policy, str):
+      policy = search_policy.split(".")
+      if len(policy) == 2:
+          return {}
+      elif len(policy) == 3:
+          _, _, model_group = policy
+          _, class_type = model_group.split("_")
+      else:
+          raise ValueError("Invalid search policy: " + search_policy)
+      
+  family_group = {}
+  if class_type == "op":
+      for idx, task in enumerate(tasks):
+          task_layers = task.desc.split('_')
+          if task_layers[1] not in family_group:
+              family_group[task_layers[1]] = []
+              family_group[task_layers[1]].append(idx)
+          else:
+              family_group[task_layers[1]].append(idx)
+
+  elif class_type == "hash":
+      for idx, task in enumerate(tasks):
+          first = task.workload_key.find("[\"") + 2
+          end = task.workload_key.find("\",")
+          task_hash = task.workload_key[first:end]
+          if task_hash not in family_group:
+              family_group[task_hash] = []
+              family_group[task_hash].append(idx)
+          else:
+              family_group[task_hash].append(idx)
+
+  elif class_type == "ind":
+      for idx, task in enumerate(tasks):
+          if task.workload_key not in family_group:
+              family_group[task.workload_key] = []
+              family_group[task.workload_key].append(idx)
+          else:
+              family_group[task.workload_key].append(idx)
+      
+  if family_group is not None:
+      for key, value in family_group.items():
+          print("family group :", key, "---", value)
+
+  return family_group
+```
+
+We use static analyzing to classify the subgraphs(tasks) based on their attributes. 
+
+2. Constructing family cost model
+```python
+elif "family" in model_group:
+  # generate cost model for each family
+  cost_model_pool = []
+  for _,group in family_group.items():
+    if model_type == "xgb":
+      cost_model = XGBModel(
+          num_warmup_sample=len(group) * num_measures_per_round,
+          model_file=load_model_file,
+          adapative_training=adapative_training,
+      )
+      if load_model_file and os.path.isfile(load_model_file):
+        logger.info("TaskScheduler: Load pretrained model...")
+        cost_model.load(load_model_file)
+      elif load_log_file:
+        logger.info("TaskScheduler: Reload measured states and train the model...")
+        cost_model.update_from_file(load_log_file)
+    elif model_type == "random":
+        cost_model = RandomModel()
+    else:
+        raise ValueError("Invalid search policy: " + search_policy)
+    cost_model_pool.append(cost_model)
+  
+  #bind each subgraph(task) with its family cost model
+  search_policies = []
+  for task_idx,task in enumerate(tasks):
+    for group_idx,group in enumerate(family_group.values()):
+        if task_idx in group:
+          search_policies.append(
+            SketchPolicy(
+                task,
+                cost_model_pool[group_idx],
+                params=search_policy_params,
+                verbose=verbose,
+                init_search_callbacks=init_search_callbacks,
+            )
+          )
+
+```
+
+After identifying similar subgraphs, We return `family_group` (a list of subgraph families) and build a cost model for each subgraph family.
+
+3.Foresee tuning
+
+```python
+def _tune_family_task(self, task_idx_groups,skip_measures_per_round):
+  """Tune the select family task for one round.
+  """
+  for task_idx in task_idx_groups:
+    # Run pre-tune callbacks
+    for callback in self.callbacks:
+      callback.pre_tune(self, task_idx)
+
+    measure_inputs, measure_results = self.search_policies[task_idx].continue_search_one_round(
+      skip_measures_per_round, self.measurer
+    )
+
+    ……
+```
+
+The foresee tuning takes `task_idx_groups` (A list of subgraph families) and `skip_measures_per_round` as inputs and tunes all the subgraphs inside the list. 

Review comment:
       What is `skip_measures_per_round`?

##########
File path: rfcs/0057-FamilySeer.md
##########
@@ -0,0 +1,212 @@
+- Feature Name: (FamilySeer: A new search method for Auto-scheduler)
+- Start Date: (2021-01-07)
+- RFC PR: [apache/tvm-rfcs#57](https://github.com/apache/tvm-rfcs/pull/57)
+- GitHub Issue: [apache/tvm#9875](https://github.com/apache/tvm/pull/9875)
+
+# Summary
+[summary]: #summary
+
+We propose FamilySeer, a new search method that optimizes search efficiency and quality of the Auto-scheduler. We introduce several features:
+
+- FamilySeer exploits the subgraph similarity to form a collection of subgraph families and constructs cost models at subgraph family basis to improve cost model accuracy.
+- We enable subgraphs within each family to share the search results within each tuning iteration, avoiding costly code measurements on real hardware and thus accelerating the search process to converge to optimal results.
+- We also make some general optimizations like enabling parallel measurement on single node with multiple GPUs and training the cost model on GPU.
+
+# Motivation
+[motivation]: #motivation
+
+Auto-scheduler (Ansor) uses code sketch and optimization rules to generate a large search space. The search space defined by Ansor has shown great opportunities and therefore the search quality and the search efficiency are determined by how we search the space.
+
+Ansor utilizes improved cost model and task scheduler to help explore the search space. The cost model analyzes and finds high-performance code transformations in the search space and the task scheduler allocates the time budget to different computation graphs. However, we find serval drawbacks to this approach:
+
+The accuracy of the cost model determines the search quality, but Ansor uses monolithic cost model to predict different computation graphs (subgraphs), resulting in an accuracy loss during tuning.
+
+The task scheduler allocates most of the time budget to subgraphs with most improving potential (i.e., those with the highest latency). This approach works well at the beginning of the autotuning. However, as the potential subgraph gradually reaches its peak performance with adequate time budget, other subgraphs have little time budget to reach its peak performance.
+
+The search process will at the end take a dozen of hours. This motivates us to find better way to explore the search space.
+
+# Guide-level explanation
+[guide-level-explanation]: #guide-level-explanation
+
+We integrate our search method into Auto-Scheduler. Therefore, users only need to change some of the parameters to enable our search method.
+
+We use the code below in [Auto-scheduling a Neural Network for NVIDIA GPU](https://tvm.apache.org/docs/how_to/tune_with_autoscheduler/tune_network_cuda.html#begin-tuning) as an example:
+
+```python
+#...
+
+# load all task and into tuner
+tuner = auto_scheduler.TaskScheduler(tasks, task_weights)
+
+#define tuning option for tuner
+tune_option = auto_scheduler.TuningOptions(
+    num_measure_trials=200,  # change this to 20000 to achieve the best performance
+    runner=measure_ctx.runner,
+    measure_callbacks=[auto_scheduler.RecordToFile(log_file)],
+)
+
+# start tuning
+#tuner.tune(tune_option) #add new parameter to tune function 
+tuner.tune(tune_option,search_policy="sketch.xgb.family_op")
+
+```
+
+The `tuner` loads the `tune_option` into the `tune` function. There are several parameters in the `tune` function (Refer to class [Taskscheduler](https://tvm.apache.org/docs/reference/api/python/auto_scheduler.html?highlight=taskscheduler#tvm.auto_scheduler.TaskScheduler)). Users can enable our method by changing the `search_policy` parameter to `sketch.xgb.family_<family_algorithm>`. We currently provide two family algorithms as an option: `op` refers to classifying subgraphs based on the core operation, and `hash` refers to classifying subgraphs based on operation sequence. We recommend using `op` to achieve better performance.

Review comment:
       By this description, `op` seems not an accurate term. IIUC, both `op` and `hash` target to subgraphs. The only different is the way to determine the similarity, so again, it would be good to have a better naming.
   
   In addition, since you recommend to use `op` for better performance, could you elaborate the need of `hash`?

##########
File path: rfcs/0057-FamilySeer.md
##########
@@ -0,0 +1,212 @@
+- Feature Name: (FamilySeer: A new search method for Auto-scheduler)
+- Start Date: (2021-01-07)
+- RFC PR: [apache/tvm-rfcs#57](https://github.com/apache/tvm-rfcs/pull/57)
+- GitHub Issue: [apache/tvm#9875](https://github.com/apache/tvm/pull/9875)

Review comment:
       The link you put is the pull request but not the tracking issue. Please open a new issue to track the PR progress and put the link here.

##########
File path: rfcs/0057-FamilySeer.md
##########
@@ -0,0 +1,212 @@
+- Feature Name: (FamilySeer: A new search method for Auto-scheduler)
+- Start Date: (2021-01-07)
+- RFC PR: [apache/tvm-rfcs#57](https://github.com/apache/tvm-rfcs/pull/57)
+- GitHub Issue: [apache/tvm#9875](https://github.com/apache/tvm/pull/9875)
+
+# Summary
+[summary]: #summary
+
+We propose FamilySeer, a new search method that optimizes search efficiency and quality of the Auto-scheduler. We introduce several features:
+
+- FamilySeer exploits the subgraph similarity to form a collection of subgraph families and constructs cost models at subgraph family basis to improve cost model accuracy.
+- We enable subgraphs within each family to share the search results within each tuning iteration, avoiding costly code measurements on real hardware and thus accelerating the search process to converge to optimal results.
+- We also make some general optimizations like enabling parallel measurement on single node with multiple GPUs and training the cost model on GPU.
+
+# Motivation
+[motivation]: #motivation
+
+Auto-scheduler (Ansor) uses code sketch and optimization rules to generate a large search space. The search space defined by Ansor has shown great opportunities and therefore the search quality and the search efficiency are determined by how we search the space.
+
+Ansor utilizes improved cost model and task scheduler to help explore the search space. The cost model analyzes and finds high-performance code transformations in the search space and the task scheduler allocates the time budget to different computation graphs. However, we find serval drawbacks to this approach:
+
+The accuracy of the cost model determines the search quality, but Ansor uses monolithic cost model to predict different computation graphs (subgraphs), resulting in an accuracy loss during tuning.
+
+The task scheduler allocates most of the time budget to subgraphs with most improving potential (i.e., those with the highest latency). This approach works well at the beginning of the autotuning. However, as the potential subgraph gradually reaches its peak performance with adequate time budget, other subgraphs have little time budget to reach its peak performance.

Review comment:
       IMHO, the solution to this problem is to improve the task scheduler strategy. Since the task scheduler predicts the peak performance of a task, as long as this prediction is accurate enough, it shouldn't spend more time on the task that has achieved the peak performance.

##########
File path: rfcs/0057-FamilySeer.md
##########
@@ -0,0 +1,212 @@
+- Feature Name: (FamilySeer: A new search method for Auto-scheduler)
+- Start Date: (2021-01-07)
+- RFC PR: [apache/tvm-rfcs#57](https://github.com/apache/tvm-rfcs/pull/57)
+- GitHub Issue: [apache/tvm#9875](https://github.com/apache/tvm/pull/9875)
+
+# Summary
+[summary]: #summary
+
+We propose FamilySeer, a new search method that optimizes search efficiency and quality of the Auto-scheduler. We introduce several features:
+
+- FamilySeer exploits the subgraph similarity to form a collection of subgraph families and constructs cost models at subgraph family basis to improve cost model accuracy.
+- We enable subgraphs within each family to share the search results within each tuning iteration, avoiding costly code measurements on real hardware and thus accelerating the search process to converge to optimal results.
+- We also make some general optimizations like enabling parallel measurement on single node with multiple GPUs and training the cost model on GPU.
+
+# Motivation
+[motivation]: #motivation
+
+Auto-scheduler (Ansor) uses code sketch and optimization rules to generate a large search space. The search space defined by Ansor has shown great opportunities and therefore the search quality and the search efficiency are determined by how we search the space.
+
+Ansor utilizes improved cost model and task scheduler to help explore the search space. The cost model analyzes and finds high-performance code transformations in the search space and the task scheduler allocates the time budget to different computation graphs. However, we find serval drawbacks to this approach:
+
+The accuracy of the cost model determines the search quality, but Ansor uses monolithic cost model to predict different computation graphs (subgraphs), resulting in an accuracy loss during tuning.
+
+The task scheduler allocates most of the time budget to subgraphs with most improving potential (i.e., those with the highest latency). This approach works well at the beginning of the autotuning. However, as the potential subgraph gradually reaches its peak performance with adequate time budget, other subgraphs have little time budget to reach its peak performance.
+
+The search process will at the end take a dozen of hours. This motivates us to find better way to explore the search space.
+
+# Guide-level explanation
+[guide-level-explanation]: #guide-level-explanation
+
+We integrate our search method into Auto-Scheduler. Therefore, users only need to change some of the parameters to enable our search method.
+
+We use the code below in [Auto-scheduling a Neural Network for NVIDIA GPU](https://tvm.apache.org/docs/how_to/tune_with_autoscheduler/tune_network_cuda.html#begin-tuning) as an example:
+
+```python
+#...
+
+# load all task and into tuner
+tuner = auto_scheduler.TaskScheduler(tasks, task_weights)
+
+#define tuning option for tuner
+tune_option = auto_scheduler.TuningOptions(
+    num_measure_trials=200,  # change this to 20000 to achieve the best performance
+    runner=measure_ctx.runner,
+    measure_callbacks=[auto_scheduler.RecordToFile(log_file)],
+)
+
+# start tuning
+#tuner.tune(tune_option) #add new parameter to tune function 
+tuner.tune(tune_option,search_policy="sketch.xgb.family_op")
+
+```
+
+The `tuner` loads the `tune_option` into the `tune` function. There are several parameters in the `tune` function (Refer to class [Taskscheduler](https://tvm.apache.org/docs/reference/api/python/auto_scheduler.html?highlight=taskscheduler#tvm.auto_scheduler.TaskScheduler)). Users can enable our method by changing the `search_policy` parameter to `sketch.xgb.family_<family_algorithm>`. We currently provide two family algorithms as an option: `op` refers to classifying subgraphs based on the core operation, and `hash` refers to classifying subgraphs based on operation sequence. We recommend using `op` to achieve better performance.
+
+# Reference-level explanation
+[reference-level-explanation]: #reference-level-explanation
+
+Our search method consists of three steps:
+
+1. Identifying similar subgraphs
+```python
+def make_family_group(
+  tasks,
+  search_policy,
+):
+  """identify each subgraphs and group them into subgraph family.
+  """
+  if search_policy == "default":
+      search_policy = "sketch.xgb"
+
+  if isinstance(search_policy, str):
+      policy = search_policy.split(".")
+      if len(policy) == 2:
+          return {}
+      elif len(policy) == 3:
+          _, _, model_group = policy
+          _, class_type = model_group.split("_")
+      else:
+          raise ValueError("Invalid search policy: " + search_policy)
+      
+  family_group = {}
+  if class_type == "op":
+      for idx, task in enumerate(tasks):
+          task_layers = task.desc.split('_')
+          if task_layers[1] not in family_group:
+              family_group[task_layers[1]] = []
+              family_group[task_layers[1]].append(idx)
+          else:
+              family_group[task_layers[1]].append(idx)
+
+  elif class_type == "hash":
+      for idx, task in enumerate(tasks):
+          first = task.workload_key.find("[\"") + 2
+          end = task.workload_key.find("\",")
+          task_hash = task.workload_key[first:end]
+          if task_hash not in family_group:
+              family_group[task_hash] = []
+              family_group[task_hash].append(idx)
+          else:
+              family_group[task_hash].append(idx)
+
+  elif class_type == "ind":
+      for idx, task in enumerate(tasks):
+          if task.workload_key not in family_group:
+              family_group[task.workload_key] = []
+              family_group[task.workload_key].append(idx)
+          else:
+              family_group[task.workload_key].append(idx)
+      
+  if family_group is not None:
+      for key, value in family_group.items():
+          print("family group :", key, "---", value)
+
+  return family_group
+```
+
+We use static analyzing to classify the subgraphs(tasks) based on their attributes. 
+
+2. Constructing family cost model
+```python
+elif "family" in model_group:
+  # generate cost model for each family
+  cost_model_pool = []
+  for _,group in family_group.items():
+    if model_type == "xgb":
+      cost_model = XGBModel(
+          num_warmup_sample=len(group) * num_measures_per_round,
+          model_file=load_model_file,
+          adapative_training=adapative_training,
+      )
+      if load_model_file and os.path.isfile(load_model_file):
+        logger.info("TaskScheduler: Load pretrained model...")
+        cost_model.load(load_model_file)
+      elif load_log_file:
+        logger.info("TaskScheduler: Reload measured states and train the model...")
+        cost_model.update_from_file(load_log_file)
+    elif model_type == "random":
+        cost_model = RandomModel()
+    else:
+        raise ValueError("Invalid search policy: " + search_policy)
+    cost_model_pool.append(cost_model)
+  
+  #bind each subgraph(task) with its family cost model
+  search_policies = []
+  for task_idx,task in enumerate(tasks):
+    for group_idx,group in enumerate(family_group.values()):
+        if task_idx in group:
+          search_policies.append(
+            SketchPolicy(
+                task,
+                cost_model_pool[group_idx],
+                params=search_policy_params,
+                verbose=verbose,
+                init_search_callbacks=init_search_callbacks,
+            )
+          )
+
+```
+
+After identifying similar subgraphs, We return `family_group` (a list of subgraph families) and build a cost model for each subgraph family.
+
+3.Foresee tuning
+
+```python
+def _tune_family_task(self, task_idx_groups,skip_measures_per_round):
+  """Tune the select family task for one round.
+  """
+  for task_idx in task_idx_groups:

Review comment:
       IIUC, in this case once TaskScheduler picks a family, all tasks in this family will be tuned for one round? If so, it seems not that efficient as some tasks may not need that many trials. Can we still let task scheduler pick one task at a time, but we just use different cost model based on the family it belongs to?
   

##########
File path: rfcs/0057-FamilySeer.md
##########
@@ -0,0 +1,212 @@
+- Feature Name: (FamilySeer: A new search method for Auto-scheduler)
+- Start Date: (2021-01-07)
+- RFC PR: [apache/tvm-rfcs#57](https://github.com/apache/tvm-rfcs/pull/57)
+- GitHub Issue: [apache/tvm#9875](https://github.com/apache/tvm/pull/9875)
+
+# Summary
+[summary]: #summary
+
+We propose FamilySeer, a new search method that optimizes search efficiency and quality of the Auto-scheduler. We introduce several features:
+
+- FamilySeer exploits the subgraph similarity to form a collection of subgraph families and constructs cost models at subgraph family basis to improve cost model accuracy.
+- We enable subgraphs within each family to share the search results within each tuning iteration, avoiding costly code measurements on real hardware and thus accelerating the search process to converge to optimal results.
+- We also make some general optimizations like enabling parallel measurement on single node with multiple GPUs and training the cost model on GPU.
+
+# Motivation
+[motivation]: #motivation
+
+Auto-scheduler (Ansor) uses code sketch and optimization rules to generate a large search space. The search space defined by Ansor has shown great opportunities and therefore the search quality and the search efficiency are determined by how we search the space.
+
+Ansor utilizes improved cost model and task scheduler to help explore the search space. The cost model analyzes and finds high-performance code transformations in the search space and the task scheduler allocates the time budget to different computation graphs. However, we find serval drawbacks to this approach:
+
+The accuracy of the cost model determines the search quality, but Ansor uses monolithic cost model to predict different computation graphs (subgraphs), resulting in an accuracy loss during tuning.
+
+The task scheduler allocates most of the time budget to subgraphs with most improving potential (i.e., those with the highest latency). This approach works well at the beginning of the autotuning. However, as the potential subgraph gradually reaches its peak performance with adequate time budget, other subgraphs have little time budget to reach its peak performance.
+
+The search process will at the end take a dozen of hours. This motivates us to find better way to explore the search space.
+
+# Guide-level explanation
+[guide-level-explanation]: #guide-level-explanation
+
+We integrate our search method into Auto-Scheduler. Therefore, users only need to change some of the parameters to enable our search method.
+
+We use the code below in [Auto-scheduling a Neural Network for NVIDIA GPU](https://tvm.apache.org/docs/how_to/tune_with_autoscheduler/tune_network_cuda.html#begin-tuning) as an example:
+
+```python
+#...
+
+# load all task and into tuner
+tuner = auto_scheduler.TaskScheduler(tasks, task_weights)
+
+#define tuning option for tuner
+tune_option = auto_scheduler.TuningOptions(
+    num_measure_trials=200,  # change this to 20000 to achieve the best performance
+    runner=measure_ctx.runner,
+    measure_callbacks=[auto_scheduler.RecordToFile(log_file)],
+)
+
+# start tuning
+#tuner.tune(tune_option) #add new parameter to tune function 
+tuner.tune(tune_option,search_policy="sketch.xgb.family_op")
+
+```
+
+The `tuner` loads the `tune_option` into the `tune` function. There are several parameters in the `tune` function (Refer to class [Taskscheduler](https://tvm.apache.org/docs/reference/api/python/auto_scheduler.html?highlight=taskscheduler#tvm.auto_scheduler.TaskScheduler)). Users can enable our method by changing the `search_policy` parameter to `sketch.xgb.family_<family_algorithm>`. We currently provide two family algorithms as an option: `op` refers to classifying subgraphs based on the core operation, and `hash` refers to classifying subgraphs based on operation sequence. We recommend using `op` to achieve better performance.
+
+# Reference-level explanation
+[reference-level-explanation]: #reference-level-explanation
+
+Our search method consists of three steps:
+
+1. Identifying similar subgraphs
+```python
+def make_family_group(
+  tasks,
+  search_policy,
+):
+  """identify each subgraphs and group them into subgraph family.
+  """
+  if search_policy == "default":
+      search_policy = "sketch.xgb"
+
+  if isinstance(search_policy, str):
+      policy = search_policy.split(".")
+      if len(policy) == 2:
+          return {}
+      elif len(policy) == 3:
+          _, _, model_group = policy
+          _, class_type = model_group.split("_")
+      else:
+          raise ValueError("Invalid search policy: " + search_policy)
+      
+  family_group = {}
+  if class_type == "op":
+      for idx, task in enumerate(tasks):
+          task_layers = task.desc.split('_')
+          if task_layers[1] not in family_group:

Review comment:
       It seems to me that this is a bit too ad-hoc:
   1. It relies on the task.desc, which is supposed to be used only for user reference and the format isn't guaranteed.
   2. It identifies the second op (e.g., `task_layers[1]`) to be the anchor op, but this may not 100% hold.

##########
File path: rfcs/0057-FamilySeer.md
##########
@@ -0,0 +1,212 @@
+- Feature Name: (FamilySeer: A new search method for Auto-scheduler)
+- Start Date: (2021-01-07)
+- RFC PR: [apache/tvm-rfcs#57](https://github.com/apache/tvm-rfcs/pull/57)
+- GitHub Issue: [apache/tvm#9875](https://github.com/apache/tvm/pull/9875)
+
+# Summary
+[summary]: #summary
+
+We propose FamilySeer, a new search method that optimizes search efficiency and quality of the Auto-scheduler. We introduce several features:
+
+- FamilySeer exploits the subgraph similarity to form a collection of subgraph families and constructs cost models at subgraph family basis to improve cost model accuracy.
+- We enable subgraphs within each family to share the search results within each tuning iteration, avoiding costly code measurements on real hardware and thus accelerating the search process to converge to optimal results.
+- We also make some general optimizations like enabling parallel measurement on single node with multiple GPUs and training the cost model on GPU.
+
+# Motivation
+[motivation]: #motivation
+
+Auto-scheduler (Ansor) uses code sketch and optimization rules to generate a large search space. The search space defined by Ansor has shown great opportunities and therefore the search quality and the search efficiency are determined by how we search the space.
+
+Ansor utilizes improved cost model and task scheduler to help explore the search space. The cost model analyzes and finds high-performance code transformations in the search space and the task scheduler allocates the time budget to different computation graphs. However, we find serval drawbacks to this approach:
+
+The accuracy of the cost model determines the search quality, but Ansor uses monolithic cost model to predict different computation graphs (subgraphs), resulting in an accuracy loss during tuning.
+
+The task scheduler allocates most of the time budget to subgraphs with most improving potential (i.e., those with the highest latency). This approach works well at the beginning of the autotuning. However, as the potential subgraph gradually reaches its peak performance with adequate time budget, other subgraphs have little time budget to reach its peak performance.
+
+The search process will at the end take a dozen of hours. This motivates us to find better way to explore the search space.
+
+# Guide-level explanation
+[guide-level-explanation]: #guide-level-explanation
+
+We integrate our search method into Auto-Scheduler. Therefore, users only need to change some of the parameters to enable our search method.
+
+We use the code below in [Auto-scheduling a Neural Network for NVIDIA GPU](https://tvm.apache.org/docs/how_to/tune_with_autoscheduler/tune_network_cuda.html#begin-tuning) as an example:
+
+```python
+#...
+
+# load all task and into tuner
+tuner = auto_scheduler.TaskScheduler(tasks, task_weights)
+
+#define tuning option for tuner
+tune_option = auto_scheduler.TuningOptions(
+    num_measure_trials=200,  # change this to 20000 to achieve the best performance
+    runner=measure_ctx.runner,
+    measure_callbacks=[auto_scheduler.RecordToFile(log_file)],
+)
+
+# start tuning
+#tuner.tune(tune_option) #add new parameter to tune function 
+tuner.tune(tune_option,search_policy="sketch.xgb.family_op")
+
+```
+
+The `tuner` loads the `tune_option` into the `tune` function. There are several parameters in the `tune` function (Refer to class [Taskscheduler](https://tvm.apache.org/docs/reference/api/python/auto_scheduler.html?highlight=taskscheduler#tvm.auto_scheduler.TaskScheduler)). Users can enable our method by changing the `search_policy` parameter to `sketch.xgb.family_<family_algorithm>`. We currently provide two family algorithms as an option: `op` refers to classifying subgraphs based on the core operation, and `hash` refers to classifying subgraphs based on operation sequence. We recommend using `op` to achieve better performance.
+
+# Reference-level explanation
+[reference-level-explanation]: #reference-level-explanation
+
+Our search method consists of three steps:
+
+1. Identifying similar subgraphs
+```python
+def make_family_group(
+  tasks,
+  search_policy,
+):
+  """identify each subgraphs and group them into subgraph family.
+  """
+  if search_policy == "default":
+      search_policy = "sketch.xgb"
+
+  if isinstance(search_policy, str):
+      policy = search_policy.split(".")
+      if len(policy) == 2:
+          return {}
+      elif len(policy) == 3:
+          _, _, model_group = policy
+          _, class_type = model_group.split("_")
+      else:
+          raise ValueError("Invalid search policy: " + search_policy)
+      
+  family_group = {}
+  if class_type == "op":
+      for idx, task in enumerate(tasks):
+          task_layers = task.desc.split('_')
+          if task_layers[1] not in family_group:
+              family_group[task_layers[1]] = []
+              family_group[task_layers[1]].append(idx)
+          else:
+              family_group[task_layers[1]].append(idx)
+
+  elif class_type == "hash":
+      for idx, task in enumerate(tasks):
+          first = task.workload_key.find("[\"") + 2
+          end = task.workload_key.find("\",")
+          task_hash = task.workload_key[first:end]
+          if task_hash not in family_group:
+              family_group[task_hash] = []
+              family_group[task_hash].append(idx)
+          else:
+              family_group[task_hash].append(idx)
+
+  elif class_type == "ind":
+      for idx, task in enumerate(tasks):
+          if task.workload_key not in family_group:
+              family_group[task.workload_key] = []
+              family_group[task.workload_key].append(idx)
+          else:
+              family_group[task.workload_key].append(idx)
+      
+  if family_group is not None:
+      for key, value in family_group.items():
+          print("family group :", key, "---", value)
+
+  return family_group
+```
+
+We use static analyzing to classify the subgraphs(tasks) based on their attributes. 
+
+2. Constructing family cost model
+```python
+elif "family" in model_group:
+  # generate cost model for each family
+  cost_model_pool = []
+  for _,group in family_group.items():
+    if model_type == "xgb":
+      cost_model = XGBModel(
+          num_warmup_sample=len(group) * num_measures_per_round,
+          model_file=load_model_file,
+          adapative_training=adapative_training,
+      )
+      if load_model_file and os.path.isfile(load_model_file):
+        logger.info("TaskScheduler: Load pretrained model...")
+        cost_model.load(load_model_file)
+      elif load_log_file:
+        logger.info("TaskScheduler: Reload measured states and train the model...")
+        cost_model.update_from_file(load_log_file)
+    elif model_type == "random":
+        cost_model = RandomModel()
+    else:
+        raise ValueError("Invalid search policy: " + search_policy)
+    cost_model_pool.append(cost_model)
+  
+  #bind each subgraph(task) with its family cost model
+  search_policies = []
+  for task_idx,task in enumerate(tasks):
+    for group_idx,group in enumerate(family_group.values()):
+        if task_idx in group:
+          search_policies.append(
+            SketchPolicy(
+                task,
+                cost_model_pool[group_idx],
+                params=search_policy_params,
+                verbose=verbose,
+                init_search_callbacks=init_search_callbacks,
+            )
+          )
+
+```
+
+After identifying similar subgraphs, We return `family_group` (a list of subgraph families) and build a cost model for each subgraph family.
+
+3.Foresee tuning
+
+```python
+def _tune_family_task(self, task_idx_groups,skip_measures_per_round):
+  """Tune the select family task for one round.
+  """
+  for task_idx in task_idx_groups:
+    # Run pre-tune callbacks
+    for callback in self.callbacks:
+      callback.pre_tune(self, task_idx)
+
+    measure_inputs, measure_results = self.search_policies[task_idx].continue_search_one_round(
+      skip_measures_per_round, self.measurer
+    )
+
+    ……
+```
+
+The foresee tuning takes `task_idx_groups` (A list of subgraph families) and `skip_measures_per_round` as inputs and tunes all the subgraphs inside the list. 
+
+# Drawbacks
+[drawbacks]: #drawbacks
+
+When searching on a larger search space (such as larger batch size), FamilySeer performs similarly or sometimes worse than Auto-scheduler. This is because a larger search space requires more time before the cost model can provide an accurate prediction. Deploying an inaccurate cost model on Foresee tuning may result in spending time budget on non-improvement code transformations.

Review comment:
       It seems to me that this can be improved somehow. For example, at the warmup stage (e.g., the first N trials), all cost models share the same training data from all tasks so that you have the same behavior as the current auto-scheduler. Afterward, you apply different data to each cost model to benefit from the task groups.

##########
File path: rfcs/0057-FamilySeer.md
##########
@@ -0,0 +1,212 @@
+- Feature Name: (FamilySeer: A new search method for Auto-scheduler)
+- Start Date: (2021-01-07)
+- RFC PR: [apache/tvm-rfcs#57](https://github.com/apache/tvm-rfcs/pull/57)
+- GitHub Issue: [apache/tvm#9875](https://github.com/apache/tvm/pull/9875)
+
+# Summary
+[summary]: #summary
+
+We propose FamilySeer, a new search method that optimizes search efficiency and quality of the Auto-scheduler. We introduce several features:
+
+- FamilySeer exploits the subgraph similarity to form a collection of subgraph families and constructs cost models at subgraph family basis to improve cost model accuracy.
+- We enable subgraphs within each family to share the search results within each tuning iteration, avoiding costly code measurements on real hardware and thus accelerating the search process to converge to optimal results.
+- We also make some general optimizations like enabling parallel measurement on single node with multiple GPUs and training the cost model on GPU.
+
+# Motivation
+[motivation]: #motivation
+
+Auto-scheduler (Ansor) uses code sketch and optimization rules to generate a large search space. The search space defined by Ansor has shown great opportunities and therefore the search quality and the search efficiency are determined by how we search the space.
+
+Ansor utilizes improved cost model and task scheduler to help explore the search space. The cost model analyzes and finds high-performance code transformations in the search space and the task scheduler allocates the time budget to different computation graphs. However, we find serval drawbacks to this approach:
+
+The accuracy of the cost model determines the search quality, but Ansor uses monolithic cost model to predict different computation graphs (subgraphs), resulting in an accuracy loss during tuning.
+
+The task scheduler allocates most of the time budget to subgraphs with most improving potential (i.e., those with the highest latency). This approach works well at the beginning of the autotuning. However, as the potential subgraph gradually reaches its peak performance with adequate time budget, other subgraphs have little time budget to reach its peak performance.
+
+The search process will at the end take a dozen of hours. This motivates us to find better way to explore the search space.
+
+# Guide-level explanation
+[guide-level-explanation]: #guide-level-explanation
+
+We integrate our search method into Auto-Scheduler. Therefore, users only need to change some of the parameters to enable our search method.
+
+We use the code below in [Auto-scheduling a Neural Network for NVIDIA GPU](https://tvm.apache.org/docs/how_to/tune_with_autoscheduler/tune_network_cuda.html#begin-tuning) as an example:
+
+```python
+#...
+
+# load all task and into tuner
+tuner = auto_scheduler.TaskScheduler(tasks, task_weights)
+
+#define tuning option for tuner
+tune_option = auto_scheduler.TuningOptions(
+    num_measure_trials=200,  # change this to 20000 to achieve the best performance
+    runner=measure_ctx.runner,
+    measure_callbacks=[auto_scheduler.RecordToFile(log_file)],
+)
+
+# start tuning
+#tuner.tune(tune_option) #add new parameter to tune function 
+tuner.tune(tune_option,search_policy="sketch.xgb.family_op")
+
+```
+
+The `tuner` loads the `tune_option` into the `tune` function. There are several parameters in the `tune` function (Refer to class [Taskscheduler](https://tvm.apache.org/docs/reference/api/python/auto_scheduler.html?highlight=taskscheduler#tvm.auto_scheduler.TaskScheduler)). Users can enable our method by changing the `search_policy` parameter to `sketch.xgb.family_<family_algorithm>`. We currently provide two family algorithms as an option: `op` refers to classifying subgraphs based on the core operation, and `hash` refers to classifying subgraphs based on operation sequence. We recommend using `op` to achieve better performance.
+
+# Reference-level explanation
+[reference-level-explanation]: #reference-level-explanation
+
+Our search method consists of three steps:
+
+1. Identifying similar subgraphs
+```python
+def make_family_group(
+  tasks,
+  search_policy,
+):
+  """identify each subgraphs and group them into subgraph family.
+  """
+  if search_policy == "default":
+      search_policy = "sketch.xgb"
+
+  if isinstance(search_policy, str):
+      policy = search_policy.split(".")
+      if len(policy) == 2:
+          return {}
+      elif len(policy) == 3:
+          _, _, model_group = policy
+          _, class_type = model_group.split("_")
+      else:
+          raise ValueError("Invalid search policy: " + search_policy)
+      
+  family_group = {}
+  if class_type == "op":
+      for idx, task in enumerate(tasks):
+          task_layers = task.desc.split('_')
+          if task_layers[1] not in family_group:
+              family_group[task_layers[1]] = []
+              family_group[task_layers[1]].append(idx)
+          else:
+              family_group[task_layers[1]].append(idx)
+
+  elif class_type == "hash":
+      for idx, task in enumerate(tasks):
+          first = task.workload_key.find("[\"") + 2
+          end = task.workload_key.find("\",")
+          task_hash = task.workload_key[first:end]
+          if task_hash not in family_group:
+              family_group[task_hash] = []
+              family_group[task_hash].append(idx)
+          else:
+              family_group[task_hash].append(idx)
+
+  elif class_type == "ind":
+      for idx, task in enumerate(tasks):
+          if task.workload_key not in family_group:
+              family_group[task.workload_key] = []
+              family_group[task.workload_key].append(idx)
+          else:
+              family_group[task.workload_key].append(idx)
+      
+  if family_group is not None:
+      for key, value in family_group.items():
+          print("family group :", key, "---", value)
+
+  return family_group
+```
+
+We use static analyzing to classify the subgraphs(tasks) based on their attributes. 
+
+2. Constructing family cost model
+```python
+elif "family" in model_group:
+  # generate cost model for each family
+  cost_model_pool = []
+  for _,group in family_group.items():
+    if model_type == "xgb":
+      cost_model = XGBModel(
+          num_warmup_sample=len(group) * num_measures_per_round,
+          model_file=load_model_file,
+          adapative_training=adapative_training,
+      )
+      if load_model_file and os.path.isfile(load_model_file):
+        logger.info("TaskScheduler: Load pretrained model...")
+        cost_model.load(load_model_file)
+      elif load_log_file:
+        logger.info("TaskScheduler: Reload measured states and train the model...")
+        cost_model.update_from_file(load_log_file)
+    elif model_type == "random":
+        cost_model = RandomModel()
+    else:
+        raise ValueError("Invalid search policy: " + search_policy)
+    cost_model_pool.append(cost_model)
+  
+  #bind each subgraph(task) with its family cost model
+  search_policies = []
+  for task_idx,task in enumerate(tasks):
+    for group_idx,group in enumerate(family_group.values()):
+        if task_idx in group:
+          search_policies.append(
+            SketchPolicy(
+                task,
+                cost_model_pool[group_idx],
+                params=search_policy_params,
+                verbose=verbose,
+                init_search_callbacks=init_search_callbacks,
+            )
+          )
+
+```
+
+After identifying similar subgraphs, We return `family_group` (a list of subgraph families) and build a cost model for each subgraph family.
+
+3.Foresee tuning
+
+```python
+def _tune_family_task(self, task_idx_groups,skip_measures_per_round):
+  """Tune the select family task for one round.
+  """
+  for task_idx in task_idx_groups:
+    # Run pre-tune callbacks
+    for callback in self.callbacks:
+      callback.pre_tune(self, task_idx)
+
+    measure_inputs, measure_results = self.search_policies[task_idx].continue_search_one_round(
+      skip_measures_per_round, self.measurer
+    )
+
+    ……
+```
+
+The foresee tuning takes `task_idx_groups` (A list of subgraph families) and `skip_measures_per_round` as inputs and tunes all the subgraphs inside the list. 
+
+# Drawbacks
+[drawbacks]: #drawbacks
+
+When searching on a larger search space (such as larger batch size), FamilySeer performs similarly or sometimes worse than Auto-scheduler. This is because a larger search space requires more time before the cost model can provide an accurate prediction. Deploying an inaccurate cost model on Foresee tuning may result in spending time budget on non-improvement code transformations.
+
+# Rationale and alternatives
+[rationale-and-alternatives]: #rationale-and-alternatives
+
+Auto-Scheduler generates a large enough search space, so searching the space with efficiency is important. With FamilySeer, Users can search for the same optimal code under less time budget. We hope that our search method can be an alternative option for those who expect to obtain better optimal code under a limited time budget.
+
+# Prior art
+[prior-art]: #prior-art
+
+Please refer to [this paper](https://arxiv.org/abs/2201.00194).
+
+# Unresolved questions
+[unresolved-questions]: #unresolved-questions
+
+Our search method is up for [discussion](https://discuss.tvm.apache.org/t/rfc-familyseer-a-new-search-method-for-auto-scheduler/11877).

Review comment:
       This is not an unresolved question. Unresolved questions should be the technical difficulty or the drawbacks of the proposed approach.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@tvm.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org