You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@tvm.apache.org by GitBox <gi...@apache.org> on 2020/12/19 08:15:25 UTC

[GitHub] [tvm] antinucleon opened a new issue #7135: [Bug] AutoScheduler / Fuse Pass bug

antinucleon opened a new issue #7135:
URL: https://github.com/apache/tvm/issues/7135


   If we run Auto Scheduler on BERT, eg with these scripts (https://github.com/octoml/Apple-M1-BERT), we will see these error messages during compiling: 
   
   ```
   Extract tasks...
   Compile...
   -----------------------------------
   Cannot find tuned schedules for target=metal -keys=metal,gpu -max_num_threads=256, workload_key=["ec4f7d9b3c9680b55f74f8646223586b"]. A fallback TOPI schedule is used, which may bring great performance regression or even compilation failure. Compute DAG info:
   placeholder = PLACEHOLDER [1, 768]
   placeholder = PLACEHOLDER [768, 768]
   T_dense(i, j) += (placeholder[i, k]*placeholder[j, k])
   ```
   
   However, with codebase in July, this message won't appear. The effect of this bug is significant:
   
   On NVIDIA T4, July codebase BERT inference time is 9ms, while the current main branch is 13ms (with similar estimation time from Auto scheduler).
   
   Unfortunately, I don't have bandwidth to fix this bug in the near weeks. 
   
   Contributions are welcomed. 
   
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [tvm] comaniac commented on issue #7135: [Bug] AutoScheduler / Fuse Pass bug

Posted by GitBox <gi...@apache.org>.
comaniac commented on issue #7135:
URL: https://github.com/apache/tvm/issues/7135#issuecomment-748527709


   Could you clarify that did you use the tuning log from July for the current upstream directly or did you re-tune the model? Looks like the task shown by the message hasn't been tuned.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [tvm] comaniac commented on issue #7135: [Bug] AutoScheduler / Fuse Pass bug

Posted by GitBox <gi...@apache.org>.
comaniac commented on issue #7135:
URL: https://github.com/apache/tvm/issues/7135#issuecomment-748535619


   nvm. I noticed that the transformers has to be in 3.0. I can repdocue the issue now.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [tvm] comaniac commented on issue #7135: [Bug] AutoScheduler / Fuse Pass bug

Posted by GitBox <gi...@apache.org>.
comaniac commented on issue #7135:
URL: https://github.com/apache/tvm/issues/7135#issuecomment-748530441


   I see. Then this should be expected, because this PR (https://github.com/apache/tvm/pull/6903) changes the way of task extraction. In your case, if there is only one mismatch task, you can tune that task along and put the tuning logs together.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [tvm] comaniac commented on issue #7135: [Bug] AutoScheduler / Fuse Pass bug

Posted by GitBox <gi...@apache.org>.
comaniac commented on issue #7135:
URL: https://github.com/apache/tvm/issues/7135#issuecomment-748537304


   OK so the root cause is your script uses `opt_level=0` when building the model; while auto_scheduler task extraction uses `opt_level=3`. I changed `opt_level` in `search_dense_gpu.py` to 3 and here is what I got:
   
   ```
   Compile...
   -----------------------------------
   Cannot find tuned schedules for target=metal -keys=metal,gpu -max_num_threads=256, workload_key=["13da82b16db5a9fde8953f4c5667d2e4"]. A fallback TOPI schedule is used, which may bring great performance regression or even compilation failure. Compute DAG info:
   placeholder = PLACEHOLDER [1, 768]
   placeholder = PLACEHOLDER [768, 768]
   T_dense(i, j) += (placeholder[i, k]*placeholder[j, k])
   placeholder = PLACEHOLDER [768]
   T_add(ax0, ax1) = (T_dense[ax0, ax1] + placeholder[ax1])
   T_minimum(ax0, ax1) = min(T_add[ax0, ax1], 9f)
   T_maximum(ax0, ax1) = max(T_minimum[ax0, ax1], -9f)
   T_fast_tanh(ax0, ax1) = ((T_maximum[ax0, ax1]*(((T_maximum[ax0, ax1]*T_maximum[ax0, ax1])*(((T_maximum[ax0, ax1]*T_maximum[ax0,  ..(OMITTED).. *T_maximum[ax0, ax1])*(((T_maximum[ax0, ax1]*T_maximum[ax0, ax1])*1.19826e-06f) + 0.000118535f)) + 0.00226843f)) + 0.00489353f))
   ```
   
   The task hash code `13da82b16db5a9fde8953f4c5667d2e4` matches one of the extracted tasks from the model:
   
   ```
   ========== Task 9  (workload key: ["13da82b16db5a9fde8953f4c5667d2e4"]) ==========
   placeholder = PLACEHOLDER [1, 768]
   placeholder = PLACEHOLDER [768, 768]
   T_dense(i, j) += (placeholder[i, k]*placeholder[j, k])
   placeholder = PLACEHOLDER [768]
   T_add(ax0, ax1) = (T_dense[ax0, ax1] + placeholder[ax1])
   T_minimum(ax0, ax1) = min(T_add[ax0, ax1], 9f)
   T_maximum(ax0, ax1) = max(T_minimum[ax0, ax1], -9f)
   T_fast_tanh(ax0, ax1) = ((T_maximum[ax0, ax1]*(((T_maximum[ax0, ax1]*T_maximum[ax0, ax1])*(((T_maximum[ax0, ax1]*T_maximum[ax0,  ..(OMITTED).. *T_maximum[ax0, ax1])*(((T_maximum[ax0, ax1]*T_maximum[ax0, ax1])*1.19826e-06f) + 0.000118535f)) + 0.00226843f)) + 0.00489353f))
   ```
   
   In conclusion, this is not really a bug, but we may need to come up with a solution to further improve task extraction configuration. I'm closing this issue first, and we could have an RFC on the discuss forum.
   
   cc @merrymercy 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [tvm] antinucleon commented on issue #7135: [Bug] AutoScheduler / Fuse Pass bug

Posted by GitBox <gi...@apache.org>.
antinucleon commented on issue #7135:
URL: https://github.com/apache/tvm/issues/7135#issuecomment-748527963


   The last is tuned, but for some reason during fusing, a fast tanh function
   location is different, which made matching fail.
   
   On Sat, Dec 19, 2020 at 13:26 Cody Yu <no...@github.com> wrote:
   
   > Could you clarify that did you use the tuning log from July for the
   > current upstream directly or did you re-tune the model? Looks like the task
   > shown by the message hasn't been tuned.
   >
   > —
   > You are receiving this because you authored the thread.
   > Reply to this email directly, view it on GitHub
   > <https://github.com/apache/tvm/issues/7135#issuecomment-748527709>, or
   > unsubscribe
   > <https://github.com/notifications/unsubscribe-auth/AAJTLXTRPLH3P5IGHURHP33SVULB7ANCNFSM4VCD3BHQ>
   > .
   >
   -- 
   Bing Xu
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [tvm] comaniac removed a comment on issue #7135: [Bug] AutoScheduler / Fuse Pass bug

Posted by GitBox <gi...@apache.org>.
comaniac removed a comment on issue #7135:
URL: https://github.com/apache/tvm/issues/7135#issuecomment-748530441


   I see. Then this should be expected, because this PR (https://github.com/apache/tvm/pull/6903) changes the way of task extraction. In your case, if there is only one mismatch task, you can tune that task along and put the tuning logs together.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [tvm] comaniac commented on issue #7135: [Bug] AutoScheduler / Fuse Pass bug

Posted by GitBox <gi...@apache.org>.
comaniac commented on issue #7135:
URL: https://github.com/apache/tvm/issues/7135#issuecomment-748535061


   OK I'll take a look next week. btw, I tried the repo but failed to convert the Pytorch BERT model to Relay. I guess it's due to incompatible Pytorch (1.7.0) or model version. The error I got when running `dump_pt.py` was missing Relay op `prim::DictConstruct`. It would be good if you could provide a minimal reproducible example for this issue so that people can directly work on it.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [tvm] antinucleon commented on issue #7135: [Bug] AutoScheduler / Fuse Pass bug

Posted by GitBox <gi...@apache.org>.
antinucleon commented on issue #7135:
URL: https://github.com/apache/tvm/issues/7135#issuecomment-748532455


   I think I didn’t express clearly (with phone). So with current upstream,
   there is a mismatch during auto scheduler and fuse pass. This bug is not
   existing in 0.7 codebase.
   
   Yes re tune an extra task is a way to hide to bug, but not a solution.
   
   On Sat, Dec 19, 2020 at 13:56 Cody Yu <no...@github.com> wrote:
   
   > So you meant that task has different hash numbers between extraction and
   > compilation? I could take a look next week if so.
   >
   > —
   > You are receiving this because you authored the thread.
   > Reply to this email directly, view it on GitHub
   > <https://github.com/apache/tvm/issues/7135#issuecomment-748530587>, or
   > unsubscribe
   > <https://github.com/notifications/unsubscribe-auth/AAJTLXR4MTK7EDD25QYXY7TSVUOSFANCNFSM4VCD3BHQ>
   > .
   >
   -- 
   Bing Xu
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [tvm] comaniac commented on issue #7135: [Bug] AutoScheduler / Fuse Pass bug

Posted by GitBox <gi...@apache.org>.
comaniac commented on issue #7135:
URL: https://github.com/apache/tvm/issues/7135#issuecomment-748530587


   So you meant that task has different hash numbers between extraction and compilation? I could take a look next week if so.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [tvm] comaniac closed issue #7135: [Bug] AutoScheduler / Fuse Pass bug

Posted by GitBox <gi...@apache.org>.
comaniac closed issue #7135:
URL: https://github.com/apache/tvm/issues/7135


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org