You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@tvm.apache.org by GitBox <gi...@apache.org> on 2020/03/05 21:58:24 UTC

[GitHub] [incubator-tvm] comaniac opened a new pull request #4995: [AutoTVM] Avoid using RPC for LocalRunner

comaniac opened a new pull request #4995: [AutoTVM] Avoid using RPC for LocalRunner
URL: https://github.com/apache/incubator-tvm/pull/4995
 
 
   ### Motivation and Summary
   
   `LocalRunner`, which measures the runtime of an op with a certain schedule config on the host machine directly, is one of the two runners in AutoTVM. However, `LocalRunner` was derived from `RPCRunner` and launched a local RPC server. The reason for this implementation was to have a unified interface and logic for both runners, but it introduces two problems:
   
   1. Overhead.
   Although local RPC session doesn't really have to send the built binary via network, it still has 1) the RPC connection overhead, and 2) an additional binary copy overhead (`RPCRunner` will upload the locally built binary to remote RPC server. In the local RPC case, it means a copy from `/tmp`to `pwd`). 
   
   2. Reliability.
   As many people have reported in the discuss, the local RPC connection may be dropped or unstable. One possible reason may come from the tornado package but it's hard to be identified and fixed. Anyway, when we are using `LocalRunner`, it doesn't make any sense to see an error or warning regrad to RPC connection.
   
   This PR refactors AutoTVM `Runner` and fully decouples `LocalRunner` and `RPCRunner`. In summary:
   * Now both `LocalRunner` and `RPCRunner` are derived from `Runner`.
   * Common logic and interfaces such as "prepare golden reference for correctness checking" and "run N configs in parallel" are lifted to base `Runner` class.
   * Each derived runner only needs to specify how to acquire TVM context (local or remote) and how to run oneconfig.
   * No change on the user interfaces and APIs.
   
   ### Evaluation
   
   Since user interface and underlying measurement remain the same, I just tested the first task extracted from ResNet-18 on CPU for evaluation. Both tests use `LocalRunner`.
   
   Without this PR:
   
   ```
   [Task  1/12]  Current/Best:   20.81/ 169.89 GFLOPS | Progress: (252/252) | 556.76 s Done
   ```
   
   With this PR:
   
   ```
   [Task  1/12]  Current/Best:   17.89/ 150.35 GFLOPS | Progress: (252/252) | 192.78 s Done
   ```
   
   While I think the performance difference should be due to the measurement error, we can clearly see that this PR is capable of reducing the tuning time.
   
   ### Issue
   
   For VTA using `RPCRunner`, it will reprogram FPGA every time before a measurement, but I am not sure if we still need this step for the real local runner. Please help review and provide your suggestions (cc @tmoreau89).
   
   @merrymercy @eqy @kevinthesun please help to review. Thanks.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [incubator-tvm] merrymercy commented on issue #4995: [AutoTVM] Avoid using RPC for LocalRunner

Posted by GitBox <gi...@apache.org>.
merrymercy commented on issue #4995: [AutoTVM] Avoid using RPC for LocalRunner
URL: https://github.com/apache/incubator-tvm/pull/4995#issuecomment-595508666
 
 
   Did you test GPU tuning? There are runtime issues like we cannot do fork after initializing CUDA runtime

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [incubator-tvm] tqchen commented on issue #4995: [AutoTVM] Avoid using RPC for LocalRunner

Posted by GitBox <gi...@apache.org>.
tqchen commented on issue #4995: [AutoTVM] Avoid using RPC for LocalRunner
URL: https://github.com/apache/incubator-tvm/pull/4995#issuecomment-595486181
 
 
   The main reason to adopt the RPC runner is for the process isolation -- because we cannot guarantee that the local runner won't crash and as a result crash the AutoTVM process itself. 
   
   It might be useful though to have a persistent RPC server over the tuning process 

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [incubator-tvm] tqchen edited a comment on issue #4995: [AutoTVM] Avoid using RPC for LocalRunner

Posted by GitBox <gi...@apache.org>.
tqchen edited a comment on issue #4995: [AutoTVM] Avoid using RPC for LocalRunner
URL: https://github.com/apache/incubator-tvm/pull/4995#issuecomment-596013185
 
 
   One thing I noticed about the old LocalRunner is that a new RPC server and tracker seems is created each time `set_task` is called https://github.com/apache/incubator-tvm/blob/master/python/tvm/autotvm/measure/measure_methods.py#L333
   
   Depending on how frequent that is called, perhaps we should change the server and tracker creation to only  once at the Local runner starting time, which should reduce the overhead.
   
   @comaniac @merrymercy 

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [incubator-tvm] jmorrill commented on issue #4995: [AutoTVM] Avoid using RPC for LocalRunner

Posted by GitBox <gi...@apache.org>.
jmorrill commented on issue #4995: [AutoTVM] Avoid using RPC for LocalRunner
URL: https://github.com/apache/incubator-tvm/pull/4995#issuecomment-596002645
 
 
   @comaniac When I was working on the Windows RPC server, my first design was to run the trials all in the same process as Windows doesn't `fork(...)`.  It did indeed have a speed improvement.
   
   After running CUDA autotune a while it looked like there was a GPU memory leak (some trial made it jump hundreds of megs).  I spent a long time looking for a leak in the C++ code, but couldn't find anything.  With my very limited understanding, it seemed as if the failed/slow cuda kernels it would test, were themselves leaking.  Only killing the process would free the GPU memory.
   
   I abandoned the in-process running of trials and moved to a similar design as the Linux current implementation.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [incubator-tvm] comaniac commented on issue #4995: [AutoTVM] Avoid using RPC for LocalRunner

Posted by GitBox <gi...@apache.org>.
comaniac commented on issue #4995: [AutoTVM] Avoid using RPC for LocalRunner
URL: https://github.com/apache/incubator-tvm/pull/4995#issuecomment-596008979
 
 
   > @comaniac When I was working on the Windows RPC server, my first design was to run the trials all in the same process as Windows doesn't `fork(...)`. It did indeed have a speed improvement.
   > 
   > After running CUDA autotune a while it looked like there was a GPU memory leak (some trial made it jump hundreds of megs). I spent a long time looking for a leak in the C++ code, but couldn't find anything. With my very limited understanding, it seemed as if the failed/slow cuda kernels it would test, were themselves leaking. Only killing the process would free the GPU memory.
   > 
   > I abandoned the in-process running of trials and moved to a similar design as the Linux current implementation.
   
   Thanks for the experience sharing. It sounds reasonable.
   The current implementation in this PR lets every executor forked process acquire a new CUDA context and this is the only solution I have right now to avoid some of the initialization errors. Ideally the process should be killed right after the child process finished so that the next job (with another new forked process) could start from a clean context. However, apparently it doesn't work as I expected.
   
   Related issue: https://stackoverflow.com/questions/22950047/cuda-initialization-error-after-fork 

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [incubator-tvm] tqchen edited a comment on issue #4995: [AutoTVM] Avoid using RPC for LocalRunner

Posted by GitBox <gi...@apache.org>.
tqchen edited a comment on issue #4995: [AutoTVM] Avoid using RPC for LocalRunner
URL: https://github.com/apache/incubator-tvm/pull/4995#issuecomment-596013185
 
 
   One thing I noticed about the old LocalRunner is that a new RPC server and tracker seems is created each time `set_task` is called https://github.com/apache/incubator-tvm/blob/master/python/tvm/autotvm/measure/measure_methods.py#L333
   
   Depending on how frequent that is called, perhaps we should change the server and tracker creation to only  once at the Local runner starting time, which should reduce the overhead.This might fix the perf issue brought up in this thread without having to bring a separate codepath for LocalRunner
   
   @comaniac @merrymercy 

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [incubator-tvm] comaniac commented on issue #4995: [AutoTVM] Avoid using RPC for LocalRunner

Posted by GitBox <gi...@apache.org>.
comaniac commented on issue #4995: [AutoTVM] Avoid using RPC for LocalRunner
URL: https://github.com/apache/incubator-tvm/pull/4995#issuecomment-595512579
 
 
   > Did you test GPU tuning? There are runtime issues like we cannot do fork after initializing CUDA runtime
   
   That's a good point. I'll find a GPU instance to test it.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [incubator-tvm] comaniac commented on issue #4995: [AutoTVM] Avoid using RPC for LocalRunner

Posted by GitBox <gi...@apache.org>.
comaniac commented on issue #4995: [AutoTVM] Avoid using RPC for LocalRunner
URL: https://github.com/apache/incubator-tvm/pull/4995#issuecomment-595998520
 
 
   @merrymercy I tested the local runner on Nvidia T4 and observed the issue you mentioned, so I put the get device attributes logic to a separate process so that the master process will not access the GPU context at all.
   
   On the other hand, there's still an issue. The tuning process may still encounter initialization errors after a number of trials. I need to further investigate the root cause.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [incubator-tvm] comaniac commented on issue #4995: [AutoTVM] Avoid using RPC for LocalRunner

Posted by GitBox <gi...@apache.org>.
comaniac commented on issue #4995: [AutoTVM] Avoid using RPC for LocalRunner
URL: https://github.com/apache/incubator-tvm/pull/4995#issuecomment-596014950
 
 
   > One thing I noticed about the old LocalRunner is that a new RPC server and tracker seems is created each time `set_task` is called https://github.com/apache/incubator-tvm/blob/master/python/tvm/autotvm/measure/measure_methods.py#L333
   > 
   > Depending on how frequent that is called, perhaps we should change the server and tracker creation to only once at the Local runner starting time, which should reduce the overhead.
   > 
   > @comaniac @merrymercy
   
   `set_task` is called only once per task so the overhead should not be a problem, especially I usually only test one task. Instead, I worry the stability of launching RPC server/tracker from master process. @merrymercy previously also found that this may cause the hanging problem. Maybe @merrymercy could comment more details.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [incubator-tvm] masahi commented on issue #4995: [AutoTVM] Avoid using RPC for LocalRunner

Posted by GitBox <gi...@apache.org>.
masahi commented on issue #4995: [AutoTVM] Avoid using RPC for LocalRunner
URL: https://github.com/apache/incubator-tvm/pull/4995#issuecomment-595563667
 
 
   AMDGPU also needs RPC runner for autotvm, if I remember correctly

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [incubator-tvm] comaniac commented on issue #4995: [AutoTVM] Avoid using RPC for LocalRunner

Posted by GitBox <gi...@apache.org>.
comaniac commented on issue #4995: [AutoTVM] Avoid using RPC for LocalRunner
URL: https://github.com/apache/incubator-tvm/pull/4995#issuecomment-595503657
 
 
   > The main reason to adopt the RPC runner is for the process isolation -- because we cannot guarantee that the local runner won't crash and as a result crash the AutoTVM process itself.
   > 
   > It might be useful though to have a persistent RPC server over the tuning process
   
   While I do agree with you that we should make sure the failure of a measurement won't crash the tuning process itself, I still don't think local RPC is the only and the best solution when considering its overhead and reliability. For example, we can use a separate thread to run `time_evaluator` locally and catch the exception if failed, just like what we did for task extraction.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [incubator-tvm] tqchen commented on issue #4995: [AutoTVM] Avoid using RPC for LocalRunner

Posted by GitBox <gi...@apache.org>.
tqchen commented on issue #4995: [AutoTVM] Avoid using RPC for LocalRunner
URL: https://github.com/apache/incubator-tvm/pull/4995#issuecomment-596013185
 
 
   One thing I noticed is that a new RPC server and tracker seems is created each time `set_task` is called https://github.com/apache/incubator-tvm/blob/master/python/tvm/autotvm/measure/measure_methods.py#L333
   
   Depending on how frequent that is called, perhaps we should change the server and tracker creation to only  once at the Local runner starting time, which should reduce the overhead.
   
   @comaniac @merrymercy 

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [incubator-tvm] tqchen closed pull request #4995: [AutoTVM] Avoid using RPC for LocalRunner

Posted by GitBox <gi...@apache.org>.
tqchen closed pull request #4995:
URL: https://github.com/apache/incubator-tvm/pull/4995


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org