You are viewing a plain text version of this content. The canonical link for it is here.

Posted to commits@tvm.apache.org by GitBox <gi...@apache.org> on 2020/09/22 10:40:21 UTC

[GitHub] [incubator-tvm] jcf94 opened a new pull request #6529: [Ansor] Parallel the InitPopulation

jcf94 opened a new pull request #6529:
URL: https://github.com/apache/incubator-tvm/pull/6529


   Some small fix:
   - Add a read-write lock for SplitFactorizationMemo
   - Parallel the InitPopulation
   
   @FrozenGene I tested this parallel implementation in my server, the running time of `tests/python/unittest/test_auto_scheduler_search_policy.py` reduced from 33s to 27s. 
   
   cc @merrymercy @comaniac 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [incubator-tvm] jcf94 commented on a change in pull request #6529: [Ansor] Parallel the InitPopulation

Posted by GitBox <gi...@apache.org>.

jcf94 commented on a change in pull request #6529:
URL: https://github.com/apache/incubator-tvm/pull/6529#discussion_r493160895



##########
File path: src/auto_scheduler/search_policy/utils.cc
##########
@@ -414,19 +414,52 @@ void PruneInvalidState(const SearchTask& task, Array<State>* states) {
   }
 }
 
+/********** SplitFactorizationMemo **********/
+
+void SplitFactorizationMemo::ReadWriteLock::GetRead() {

Review comment:
       Updated the doc string as well as added the comments.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [incubator-tvm] jcf94 commented on pull request #6529: [Ansor] Parallel the InitPopulation

Posted by GitBox <gi...@apache.org>.

jcf94 commented on pull request #6529:
URL: https://github.com/apache/incubator-tvm/pull/6529#issuecomment-697100009


   > We probably should only benchmark the sample initial population part and compare the actual speedup with idea speedup.
   > We then can know how effective the parallelization is.
   
   Good suggestion, from the tuning log I can see it reduced from an average of 2~3 seconds:
   ```
   A CUDA test from the master branch
   ...
   Sample Initial Population       #s: 456 fail_ct: 2048   Time elapsed: 2.60
   ```
   to an average of 0.05~0.5 seconds:
   ```
   The same test from this branch(This even tried more populations)
   ...
   Sample Initial Population       #s: 648 fail_ct: 3448   Time elapsed: 0.17
   ```
   on my server.
   
   cc @FrozenGene 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [incubator-tvm] jcf94 edited a comment on pull request #6529: [Ansor] Parallel the InitPopulation

Posted by GitBox <gi...@apache.org>.

jcf94 edited a comment on pull request #6529:
URL: https://github.com/apache/incubator-tvm/pull/6529#issuecomment-697100009


   > We probably should only benchmark the sample initial population part and compare the actual speedup with idea speedup.
   > We then can know how effective the parallelization is.
   
   Good suggestion, from the tuning log I can see it reduced from an average of 2~3 seconds:
   ```
   A CUDA test from the master branch
   ...
   Sample Initial Population       #s: 456 fail_ct: 2048   Time elapsed: 2.60
   ```
   to an average of 0.05~0.5 seconds:
   ```
   The same test from this branch(This even tried more populations)
   ...
   Sample Initial Population       #s: 648 fail_ct: 3448   Time elapsed: 0.17
   ```
   on my server. cc @FrozenGene 
   
   And I'm thinking about maybe the Evolutionary Search part can also benefit from parallel_for in the same way? @merrymercy @comaniac 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [incubator-tvm] merrymercy edited a comment on pull request #6529: [Ansor] Parallel the InitPopulation

Posted by GitBox <gi...@apache.org>.

merrymercy edited a comment on pull request #6529:
URL: https://github.com/apache/incubator-tvm/pull/6529#issuecomment-696746963


   We probably should only benchmark the sample initial population part and compare the actual speedup with idea speedup.
   We then can how effective the parallelization is.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [incubator-tvm] jcf94 edited a comment on pull request #6529: [Ansor] Parallel the InitPopulation

Posted by GitBox <gi...@apache.org>.

jcf94 edited a comment on pull request #6529:
URL: https://github.com/apache/incubator-tvm/pull/6529#issuecomment-697100009


   > We probably should only benchmark the sample initial population part and compare the actual speedup with idea speedup.
   > We then can know how effective the parallelization is.
   
   Good suggestion, from the tuning log I can see it reduced from an average of 2~3 seconds:
   ```
   A CUDA test from the master branch
   ...
   Sample Initial Population       #s: 456 fail_ct: 2048   Time elapsed: 2.60
   ```
   to an average of 0.05~0.5 seconds:
   ```
   The same test from this branch(This even tried more populations)
   ...
   Sample Initial Population       #s: 648 fail_ct: 3448   Time elapsed: 0.17
   ```
   on my server. cc @FrozenGene 
   
   And I'm thinking about maybe the Evolutionary Search part can also benefit from parallel_for in the same way? @merrymercy @comaniac 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [incubator-tvm] jcf94 commented on pull request #6529: [Ansor] Parallel the InitPopulation

Posted by GitBox <gi...@apache.org>.

jcf94 commented on pull request #6529:
URL: https://github.com/apache/incubator-tvm/pull/6529#issuecomment-697100009


   > We probably should only benchmark the sample initial population part and compare the actual speedup with idea speedup.
   > We then can know how effective the parallelization is.
   
   Good suggestion, from the tuning log I can see it reduced from an average of 2~3 seconds:
   ```
   A CUDA test from the master branch
   ...
   Sample Initial Population       #s: 456 fail_ct: 2048   Time elapsed: 2.60
   ```
   to an average of 0.05~0.5 seconds:
   ```
   The same test from this branch(This even tried more populations)
   ...
   Sample Initial Population       #s: 648 fail_ct: 3448   Time elapsed: 0.17
   ```
   on my server.
   
   cc @FrozenGene 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [incubator-tvm] comaniac commented on a change in pull request #6529: [Ansor] Parallel the InitPopulation

Posted by GitBox <gi...@apache.org>.

comaniac commented on a change in pull request #6529:
URL: https://github.com/apache/incubator-tvm/pull/6529#discussion_r492949744



##########
File path: src/auto_scheduler/search_policy/sketch_policy.cc
##########
@@ -332,29 +333,45 @@ Array<State> SketchPolicyNode::GenerateSketches() {
 }
 
 Array<State> SketchPolicyNode::SampleInitPopulation(const Array<State>& sketches, int out_size) {
-  int fail_ct = 0;
+  std::atomic<int> fail_ct(0);
   Array<State> out_states;
+  std::vector<std::mt19937> rand_seeds;
+  rand_seeds.reserve(out_size);
+  for (int i = 0; i < out_size; i++) {
+    rand_seeds.push_back(std::mt19937(rand_gen()));
+  }
   auto tic_begin = std::chrono::high_resolution_clock::now();
 
   while (static_cast<int>(out_states.size()) < out_size && fail_ct < out_size) {
-    // Random choose a starting sketch
-    // TODO(jcf94, merrymercy): Maybe choose sketches in different possibility for they may have
-    // different potential on generating state with better performance
-    State tmp_s = sketches[(rand_gen)() % sketches.size()];
-
-    // Derivation rule based enumeration
-    bool valid = true;
-    for (const auto& rule : init_rules) {
-      if (rule->Apply(this, &tmp_s) == PopulationGenerationRule::ResultKind::kInvalid) {
-        valid = false;
-        break;
+    std::vector<State> temp_states(out_size);
+
+    support::parallel_for(0, out_size - out_states.size(),
+        [this, &temp_states, &sketches, &rand_seeds, &fail_ct](int index) {
+      // Random choose a starting sketch
+      // TODO(jcf94, merrymercy): Maybe choose sketches in different possibility for they may have
+      // different potential on generating state with better performance

Review comment:
       Move the comments to the place of selecting sketches.

##########
File path: src/auto_scheduler/search_policy/sketch_policy.cc
##########
@@ -332,29 +333,45 @@ Array<State> SketchPolicyNode::GenerateSketches() {
 }
 
 Array<State> SketchPolicyNode::SampleInitPopulation(const Array<State>& sketches, int out_size) {
-  int fail_ct = 0;
+  std::atomic<int> fail_ct(0);
   Array<State> out_states;
+  std::vector<std::mt19937> rand_seeds;

Review comment:
       This is not rand_"seeds" but "rand_gens" to me.

##########
File path: src/auto_scheduler/search_policy/utils.cc
##########
@@ -414,19 +414,52 @@ void PruneInvalidState(const SearchTask& task, Array<State>* states) {
   }
 }
 
+/********** SplitFactorizationMemo **********/
+
+void SplitFactorizationMemo::ReadWriteLock::GetRead() {
+  std::unique_lock<std::mutex> lock(cv_mutex_);
+  cv_.wait(lock, [this](){ return !this->is_writing_; });
+  read_count_++;
+}
+
+void SplitFactorizationMemo::ReadWriteLock::GetWrite() {

Review comment:
       Better to comment the write condition (no one is reading or writing).

##########
File path: src/auto_scheduler/search_policy/sketch_policy.cc
##########
@@ -332,29 +333,45 @@ Array<State> SketchPolicyNode::GenerateSketches() {
 }
 
 Array<State> SketchPolicyNode::SampleInitPopulation(const Array<State>& sketches, int out_size) {
-  int fail_ct = 0;
+  std::atomic<int> fail_ct(0);
   Array<State> out_states;
+  std::vector<std::mt19937> rand_seeds;
+  rand_seeds.reserve(out_size);
+  for (int i = 0; i < out_size; i++) {
+    rand_seeds.push_back(std::mt19937(rand_gen()));
+  }
   auto tic_begin = std::chrono::high_resolution_clock::now();
 
   while (static_cast<int>(out_states.size()) < out_size && fail_ct < out_size) {
-    // Random choose a starting sketch
-    // TODO(jcf94, merrymercy): Maybe choose sketches in different possibility for they may have
-    // different potential on generating state with better performance
-    State tmp_s = sketches[(rand_gen)() % sketches.size()];
-
-    // Derivation rule based enumeration
-    bool valid = true;
-    for (const auto& rule : init_rules) {
-      if (rule->Apply(this, &tmp_s) == PopulationGenerationRule::ResultKind::kInvalid) {
-        valid = false;
-        break;
+    std::vector<State> temp_states(out_size);
+
+    support::parallel_for(0, out_size - out_states.size(),
+        [this, &temp_states, &sketches, &rand_seeds, &fail_ct](int index) {
+      // Random choose a starting sketch
+      // TODO(jcf94, merrymercy): Maybe choose sketches in different possibility for they may have
+      // different potential on generating state with better performance
+      State tmp_s = sketches[(rand_seeds[index])() % sketches.size()];
+
+      // Derivation rule based enumeration
+      bool valid = true;
+      for (const auto& rule : init_rules) {
+        if (rule->Apply(this, &tmp_s, &rand_seeds[index]) == PopulationGenerationRule::ResultKind::kInvalid) {
+          valid = false;
+          break;
+        }
       }
-    }
 
-    if (valid) {
-      out_states.push_back(std::move(tmp_s));
-    } else {
-      fail_ct++;
+      if (valid) {
+        temp_states[index] = std::move(tmp_s);
+      } else {
+        fail_ct++;

Review comment:
       It seems to me that this atomic add could be avoided as long as you simply put all schedules to `temp_states` and count the failures afterwards at L372.

##########
File path: src/auto_scheduler/search_policy/utils.cc
##########
@@ -414,19 +414,52 @@ void PruneInvalidState(const SearchTask& task, Array<State>* states) {
   }
 }
 
+/********** SplitFactorizationMemo **********/
+
+void SplitFactorizationMemo::ReadWriteLock::GetRead() {

Review comment:
       Better to comment the read condition (no one is writing).




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [incubator-tvm] FrozenGene commented on a change in pull request #6529: [Ansor] Parallel the InitPopulation

Posted by GitBox <gi...@apache.org>.

FrozenGene commented on a change in pull request #6529:
URL: https://github.com/apache/incubator-tvm/pull/6529#discussion_r493190787



##########
File path: src/auto_scheduler/search_policy/sketch_policy.cc
##########
@@ -334,28 +335,44 @@ Array<State> SketchPolicyNode::GenerateSketches() {
 Array<State> SketchPolicyNode::SampleInitPopulation(const Array<State>& sketches, int out_size) {
   int fail_ct = 0;
   Array<State> out_states;
+  std::vector<std::mt19937> rand_gens;
+  rand_gens.reserve(out_size);
+  for (int i = 0; i < out_size; i++) {
+    rand_gens.push_back(std::mt19937(rand_gen()));
+  }
   auto tic_begin = std::chrono::high_resolution_clock::now();
 
   while (static_cast<int>(out_states.size()) < out_size && fail_ct < out_size) {
-    // Random choose a starting sketch
-    // TODO(jcf94, merrymercy): Maybe choose sketches in different possibility for they may have
-    // different potential on generating state with better performance
-    State tmp_s = sketches[(rand_gen)() % sketches.size()];
-
-    // Derivation rule based enumeration
-    bool valid = true;
-    for (const auto& rule : init_rules) {
-      if (rule->Apply(this, &tmp_s) == PopulationGenerationRule::ResultKind::kInvalid) {
-        valid = false;
-        break;
+    std::vector<State> temp_states(out_size);
+
+    support::parallel_for(0, out_size - out_states.size(),
+                          [this, &temp_states, &sketches, &rand_gens](int index) {
+                            // Random choose a starting sketch
+                            // TODO(jcf94, merrymercy): Maybe choose sketches in different
+                            // possibility for they may have different potential on generating state
+                            // with better performance
+                            State tmp_s = sketches[(rand_gens[index])() % sketches.size()];
+                            // Derivation rule based enumeration
+                            bool valid = true;
+                            for (const auto& rule : init_rules) {
+                              if (rule->Apply(this, &tmp_s, &rand_gens[index]) ==
+                                  PopulationGenerationRule::ResultKind::kInvalid) {
+                                valid = false;
+                                break;
+                              }
+                            }
+                            if (valid) {
+                              temp_states[index] = std::move(tmp_s);
+                            }
+                          });
+
+    for (int i = 0; i < out_size; i++) {
+      if (temp_states[i].defined()) {
+        out_states.push_back(std::move(temp_states[i]));

Review comment:
       do we need one lock here when we write elements in the `out_states`?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [incubator-tvm] jcf94 commented on a change in pull request #6529: [Ansor] Parallel the InitPopulation

Posted by GitBox <gi...@apache.org>.

jcf94 commented on a change in pull request #6529:
URL: https://github.com/apache/incubator-tvm/pull/6529#discussion_r493162832



##########
File path: src/auto_scheduler/search_policy/sketch_policy.cc
##########
@@ -332,29 +333,45 @@ Array<State> SketchPolicyNode::GenerateSketches() {
 }
 
 Array<State> SketchPolicyNode::SampleInitPopulation(const Array<State>& sketches, int out_size) {
-  int fail_ct = 0;
+  std::atomic<int> fail_ct(0);
   Array<State> out_states;
+  std::vector<std::mt19937> rand_seeds;
+  rand_seeds.reserve(out_size);
+  for (int i = 0; i < out_size; i++) {
+    rand_seeds.push_back(std::mt19937(rand_gen()));
+  }
   auto tic_begin = std::chrono::high_resolution_clock::now();
 
   while (static_cast<int>(out_states.size()) < out_size && fail_ct < out_size) {
-    // Random choose a starting sketch
-    // TODO(jcf94, merrymercy): Maybe choose sketches in different possibility for they may have
-    // different potential on generating state with better performance
-    State tmp_s = sketches[(rand_gen)() % sketches.size()];
-
-    // Derivation rule based enumeration
-    bool valid = true;
-    for (const auto& rule : init_rules) {
-      if (rule->Apply(this, &tmp_s) == PopulationGenerationRule::ResultKind::kInvalid) {
-        valid = false;
-        break;
+    std::vector<State> temp_states(out_size);
+
+    support::parallel_for(0, out_size - out_states.size(),
+        [this, &temp_states, &sketches, &rand_seeds, &fail_ct](int index) {
+      // Random choose a starting sketch
+      // TODO(jcf94, merrymercy): Maybe choose sketches in different possibility for they may have
+      // different potential on generating state with better performance

Review comment:
       Yes, this comment is just for the `State tmp_s = sketches[(rand_seeds[index])() % sketches.size()];`, which randomly choose a sketch as the start point of this InitPopulation thread.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [incubator-tvm] jcf94 commented on a change in pull request #6529: [Ansor] Parallel the InitPopulation

Posted by GitBox <gi...@apache.org>.

jcf94 commented on a change in pull request #6529:
URL: https://github.com/apache/incubator-tvm/pull/6529#discussion_r493160895



##########
File path: src/auto_scheduler/search_policy/utils.cc
##########
@@ -414,19 +414,52 @@ void PruneInvalidState(const SearchTask& task, Array<State>* states) {
   }
 }
 
+/********** SplitFactorizationMemo **********/
+
+void SplitFactorizationMemo::ReadWriteLock::GetRead() {

Review comment:
       Updated the doc string as well as added the comments.

##########
File path: src/auto_scheduler/search_policy/sketch_policy.cc
##########
@@ -332,29 +333,45 @@ Array<State> SketchPolicyNode::GenerateSketches() {
 }
 
 Array<State> SketchPolicyNode::SampleInitPopulation(const Array<State>& sketches, int out_size) {
-  int fail_ct = 0;
+  std::atomic<int> fail_ct(0);
   Array<State> out_states;
+  std::vector<std::mt19937> rand_seeds;
+  rand_seeds.reserve(out_size);
+  for (int i = 0; i < out_size; i++) {
+    rand_seeds.push_back(std::mt19937(rand_gen()));
+  }
   auto tic_begin = std::chrono::high_resolution_clock::now();
 
   while (static_cast<int>(out_states.size()) < out_size && fail_ct < out_size) {
-    // Random choose a starting sketch
-    // TODO(jcf94, merrymercy): Maybe choose sketches in different possibility for they may have
-    // different potential on generating state with better performance
-    State tmp_s = sketches[(rand_gen)() % sketches.size()];
-
-    // Derivation rule based enumeration
-    bool valid = true;
-    for (const auto& rule : init_rules) {
-      if (rule->Apply(this, &tmp_s) == PopulationGenerationRule::ResultKind::kInvalid) {
-        valid = false;
-        break;
+    std::vector<State> temp_states(out_size);
+
+    support::parallel_for(0, out_size - out_states.size(),
+        [this, &temp_states, &sketches, &rand_seeds, &fail_ct](int index) {
+      // Random choose a starting sketch
+      // TODO(jcf94, merrymercy): Maybe choose sketches in different possibility for they may have
+      // different potential on generating state with better performance

Review comment:
       Yes, this comment is just for the `State tmp_s = sketches[(rand_seeds[index])() % sketches.size()];`, which randomly choose a sketch as the start point of this InitPopulation thread.

##########
File path: src/auto_scheduler/search_policy/sketch_policy.cc
##########
@@ -334,28 +335,44 @@ Array<State> SketchPolicyNode::GenerateSketches() {
 Array<State> SketchPolicyNode::SampleInitPopulation(const Array<State>& sketches, int out_size) {
   int fail_ct = 0;
   Array<State> out_states;
+  std::vector<std::mt19937> rand_gens;
+  rand_gens.reserve(out_size);
+  for (int i = 0; i < out_size; i++) {
+    rand_gens.push_back(std::mt19937(rand_gen()));
+  }
   auto tic_begin = std::chrono::high_resolution_clock::now();
 
   while (static_cast<int>(out_states.size()) < out_size && fail_ct < out_size) {
-    // Random choose a starting sketch
-    // TODO(jcf94, merrymercy): Maybe choose sketches in different possibility for they may have
-    // different potential on generating state with better performance
-    State tmp_s = sketches[(rand_gen)() % sketches.size()];
-
-    // Derivation rule based enumeration
-    bool valid = true;
-    for (const auto& rule : init_rules) {
-      if (rule->Apply(this, &tmp_s) == PopulationGenerationRule::ResultKind::kInvalid) {
-        valid = false;
-        break;
+    std::vector<State> temp_states(out_size);
+
+    support::parallel_for(0, out_size - out_states.size(),
+                          [this, &temp_states, &sketches, &rand_gens](int index) {
+                            // Random choose a starting sketch
+                            // TODO(jcf94, merrymercy): Maybe choose sketches in different
+                            // possibility for they may have different potential on generating state
+                            // with better performance
+                            State tmp_s = sketches[(rand_gens[index])() % sketches.size()];
+                            // Derivation rule based enumeration
+                            bool valid = true;
+                            for (const auto& rule : init_rules) {
+                              if (rule->Apply(this, &tmp_s, &rand_gens[index]) ==
+                                  PopulationGenerationRule::ResultKind::kInvalid) {
+                                valid = false;
+                                break;
+                              }
+                            }
+                            if (valid) {
+                              temp_states[index] = std::move(tmp_s);
+                            }
+                          });
+
+    for (int i = 0; i < out_size; i++) {
+      if (temp_states[i].defined()) {
+        out_states.push_back(std::move(temp_states[i]));

Review comment:
       This part is outside of the `parallel_for` block, it is fine.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [incubator-tvm] comaniac commented on a change in pull request #6529: [Ansor] Parallel the InitPopulation

Posted by GitBox <gi...@apache.org>.

comaniac commented on a change in pull request #6529:
URL: https://github.com/apache/incubator-tvm/pull/6529#discussion_r492949744



##########
File path: src/auto_scheduler/search_policy/sketch_policy.cc
##########
@@ -332,29 +333,45 @@ Array<State> SketchPolicyNode::GenerateSketches() {
 }
 
 Array<State> SketchPolicyNode::SampleInitPopulation(const Array<State>& sketches, int out_size) {
-  int fail_ct = 0;
+  std::atomic<int> fail_ct(0);
   Array<State> out_states;
+  std::vector<std::mt19937> rand_seeds;
+  rand_seeds.reserve(out_size);
+  for (int i = 0; i < out_size; i++) {
+    rand_seeds.push_back(std::mt19937(rand_gen()));
+  }
   auto tic_begin = std::chrono::high_resolution_clock::now();
 
   while (static_cast<int>(out_states.size()) < out_size && fail_ct < out_size) {
-    // Random choose a starting sketch
-    // TODO(jcf94, merrymercy): Maybe choose sketches in different possibility for they may have
-    // different potential on generating state with better performance
-    State tmp_s = sketches[(rand_gen)() % sketches.size()];
-
-    // Derivation rule based enumeration
-    bool valid = true;
-    for (const auto& rule : init_rules) {
-      if (rule->Apply(this, &tmp_s) == PopulationGenerationRule::ResultKind::kInvalid) {
-        valid = false;
-        break;
+    std::vector<State> temp_states(out_size);
+
+    support::parallel_for(0, out_size - out_states.size(),
+        [this, &temp_states, &sketches, &rand_seeds, &fail_ct](int index) {
+      // Random choose a starting sketch
+      // TODO(jcf94, merrymercy): Maybe choose sketches in different possibility for they may have
+      // different potential on generating state with better performance

Review comment:
       Move the comments to the place of selecting sketches.

##########
File path: src/auto_scheduler/search_policy/sketch_policy.cc
##########
@@ -332,29 +333,45 @@ Array<State> SketchPolicyNode::GenerateSketches() {
 }
 
 Array<State> SketchPolicyNode::SampleInitPopulation(const Array<State>& sketches, int out_size) {
-  int fail_ct = 0;
+  std::atomic<int> fail_ct(0);
   Array<State> out_states;
+  std::vector<std::mt19937> rand_seeds;

Review comment:
       This is not rand_"seeds" but "rand_gens" to me.

##########
File path: src/auto_scheduler/search_policy/utils.cc
##########
@@ -414,19 +414,52 @@ void PruneInvalidState(const SearchTask& task, Array<State>* states) {
   }
 }
 
+/********** SplitFactorizationMemo **********/
+
+void SplitFactorizationMemo::ReadWriteLock::GetRead() {
+  std::unique_lock<std::mutex> lock(cv_mutex_);
+  cv_.wait(lock, [this](){ return !this->is_writing_; });
+  read_count_++;
+}
+
+void SplitFactorizationMemo::ReadWriteLock::GetWrite() {

Review comment:
       Better to comment the write condition (no one is reading or writing).

##########
File path: src/auto_scheduler/search_policy/sketch_policy.cc
##########
@@ -332,29 +333,45 @@ Array<State> SketchPolicyNode::GenerateSketches() {
 }
 
 Array<State> SketchPolicyNode::SampleInitPopulation(const Array<State>& sketches, int out_size) {
-  int fail_ct = 0;
+  std::atomic<int> fail_ct(0);
   Array<State> out_states;
+  std::vector<std::mt19937> rand_seeds;
+  rand_seeds.reserve(out_size);
+  for (int i = 0; i < out_size; i++) {
+    rand_seeds.push_back(std::mt19937(rand_gen()));
+  }
   auto tic_begin = std::chrono::high_resolution_clock::now();
 
   while (static_cast<int>(out_states.size()) < out_size && fail_ct < out_size) {
-    // Random choose a starting sketch
-    // TODO(jcf94, merrymercy): Maybe choose sketches in different possibility for they may have
-    // different potential on generating state with better performance
-    State tmp_s = sketches[(rand_gen)() % sketches.size()];
-
-    // Derivation rule based enumeration
-    bool valid = true;
-    for (const auto& rule : init_rules) {
-      if (rule->Apply(this, &tmp_s) == PopulationGenerationRule::ResultKind::kInvalid) {
-        valid = false;
-        break;
+    std::vector<State> temp_states(out_size);
+
+    support::parallel_for(0, out_size - out_states.size(),
+        [this, &temp_states, &sketches, &rand_seeds, &fail_ct](int index) {
+      // Random choose a starting sketch
+      // TODO(jcf94, merrymercy): Maybe choose sketches in different possibility for they may have
+      // different potential on generating state with better performance
+      State tmp_s = sketches[(rand_seeds[index])() % sketches.size()];
+
+      // Derivation rule based enumeration
+      bool valid = true;
+      for (const auto& rule : init_rules) {
+        if (rule->Apply(this, &tmp_s, &rand_seeds[index]) == PopulationGenerationRule::ResultKind::kInvalid) {
+          valid = false;
+          break;
+        }
       }
-    }
 
-    if (valid) {
-      out_states.push_back(std::move(tmp_s));
-    } else {
-      fail_ct++;
+      if (valid) {
+        temp_states[index] = std::move(tmp_s);
+      } else {
+        fail_ct++;

Review comment:
       It seems to me that this atomic add could be avoided as long as you simply put all schedules to `temp_states` and count the failures afterwards at L372.

##########
File path: src/auto_scheduler/search_policy/utils.cc
##########
@@ -414,19 +414,52 @@ void PruneInvalidState(const SearchTask& task, Array<State>* states) {
   }
 }
 
+/********** SplitFactorizationMemo **********/
+
+void SplitFactorizationMemo::ReadWriteLock::GetRead() {

Review comment:
       Better to comment the read condition (no one is writing).




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [incubator-tvm] merrymercy edited a comment on pull request #6529: [Ansor] Parallel the InitPopulation

Posted by GitBox <gi...@apache.org>.

merrymercy edited a comment on pull request #6529:
URL: https://github.com/apache/incubator-tvm/pull/6529#issuecomment-696746963






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [incubator-tvm] merrymercy commented on pull request #6529: [Ansor] Parallel the InitPopulation

Posted by GitBox <gi...@apache.org>.

merrymercy commented on pull request #6529:
URL: https://github.com/apache/incubator-tvm/pull/6529#issuecomment-696746963


   We probably should benchmark only the sample initial population part and compare the actual speedup with idea speedup.
   We then can how effective the parallelization is.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [incubator-tvm] merrymercy edited a comment on pull request #6529: [Ansor] Parallel the InitPopulation

Posted by GitBox <gi...@apache.org>.

merrymercy edited a comment on pull request #6529:
URL: https://github.com/apache/incubator-tvm/pull/6529#issuecomment-696746963


   We probably should only benchmark the sample initial population part and compare the actual speedup with idea speedup.
   We then can know how effective the parallelization is.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [incubator-tvm] tqchen merged pull request #6529: [Ansor] Parallel the InitPopulation

Posted by GitBox <gi...@apache.org>.

tqchen merged pull request #6529:
URL: https://github.com/apache/incubator-tvm/pull/6529


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [incubator-tvm] merrymercy commented on pull request #6529: [Ansor] Parallel the InitPopulation

Posted by GitBox <gi...@apache.org>.

merrymercy commented on pull request #6529:
URL: https://github.com/apache/incubator-tvm/pull/6529#issuecomment-696746963


   We probably should benchmark only the sample initial population part and compare the actual speedup with idea speedup.
   We then can how effective the parallelization is.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org