You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@tvm.apache.org by GitBox <gi...@apache.org> on 2020/11/03 13:55:25 UTC

[GitHub] [incubator-tvm] jcf94 opened a new pull request #6830: [WIP][AutoScheduler] Bug fix for layout rewrite CI error in i386

jcf94 opened a new pull request #6830:
URL: https://github.com/apache/incubator-tvm/pull/6830


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-tvm] tqchen merged pull request #6830: [AutoScheduler] Bug fix for layout rewrite CI error in i386

Posted by GitBox <gi...@apache.org>.
tqchen merged pull request #6830:
URL: https://github.com/apache/incubator-tvm/pull/6830


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-tvm] jcf94 edited a comment on pull request #6830: [WIP][AutoScheduler] Bug fix for layout rewrite CI error in i386

Posted by GitBox <gi...@apache.org>.
jcf94 edited a comment on pull request #6830:
URL: https://github.com/apache/incubator-tvm/pull/6830#issuecomment-721464517


   # Problem
   
   Compute get wrong result after layout rewrite, and this only occurs in i386 CI.
   
   # Current Status
   
   After trying many different tests, I guess I have finally found the reason. Only i386 CI used llvm-4 to build the TVM.
   
   This test has set the random seed to a fixed number that AutoScheduler can always generate a same schedule.
   i386 CI with llvm-4: https://ci.tlcpack.ai/blue/rest/organizations/jenkins/pipelines/tvm/branches/PR-6830/runs/4/nodes/253/steps/331/log/?start=0
   
   Same schedule in i386 CI with llvm-8: https://ci.tlcpack.ai/blue/rest/organizations/jenkins/pipelines/tvm/branches/PR-6830/runs/7/nodes/253/steps/331/log/?start=0
   
   The lowered result of TVM is exactly the same, so I think the only cause may be some special bug during llvm codegen in llvm-4.
   
   To fully confirm it, we may need to compare their llvm ir. I'm trying llvm-4 in my local runtime to see if this bug can be reproduced.
   
   cc @merrymercy @comaniac @tqchen @masahi 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-tvm] jcf94 edited a comment on pull request #6830: [AutoScheduler] Bug fix for layout rewrite CI error in i386

Posted by GitBox <gi...@apache.org>.
jcf94 edited a comment on pull request #6830:
URL: https://github.com/apache/incubator-tvm/pull/6830#issuecomment-721582119


   This problem can be reproduced in my local runtime with ci-i386 docker.
   
   Seems the float point operations under 32bit environment trends to be less accurate than 64bit?
   
   I've tried more tests on different llvm versions, codegen results with higher llvm version can still encounter accuracy problem, but with lower possibility. In x86_64 environment, different llvm versions all worked well even with atol and rtol setting to 1e-7.
   
   Currently a better way to fix this may still be setting a bigger atol and rtol value.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-tvm] jcf94 edited a comment on pull request #6830: [AutoScheduler] Bug fix for layout rewrite CI error in i386

Posted by GitBox <gi...@apache.org>.
jcf94 edited a comment on pull request #6830:
URL: https://github.com/apache/incubator-tvm/pull/6830#issuecomment-721582119


   This problem can be reproduced in my local runtime with ci-i386 docker.
   
   Seems the float point operations under 32bit environment trends to be less accurate than 64bit?
   
   I've tried more tests on different llvm versions, codegen results with higher llvm version can still encounter accuracy problem, but with lower possiblity. In x86_64 environment, even with atol = rtol = 1e-7, different llvm versions all worked well.
   
   Currently a better way to fix this may still be setting a bigger atol and rtol value.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-tvm] jcf94 edited a comment on pull request #6830: [WIP][AutoScheduler] Bug fix for layout rewrite CI error in i386

Posted by GitBox <gi...@apache.org>.
jcf94 edited a comment on pull request #6830:
URL: https://github.com/apache/incubator-tvm/pull/6830#issuecomment-721464517


   # Problem
   
   Compute get wrong result after layout rewrite, and this only occurs in i386 CI.
   
   # Current Status
   
   After trying many different tests, I guess I have finally found the reason. Only i386 CI used llvm-4 to build the TVM.
   
   This test has set the random seed to a fixed number that AutoScheduler can generate a same schedule.
   i386 CI with llvm-4: https://ci.tlcpack.ai/blue/rest/organizations/jenkins/pipelines/tvm/branches/PR-6830/runs/4/nodes/253/steps/331/log/?start=0
   
   Same schedule in i386 CI with llvm-8: https://ci.tlcpack.ai/blue/rest/organizations/jenkins/pipelines/tvm/branches/PR-6830/runs/7/nodes/253/steps/331/log/?start=0
   
   The lowered result of TVM is exactly the same, so I think the only cause may be some special bug during llvm codegen in llvm-4.
   
   To fully confirm it, we may need to compare their llvm ir. I'm trying llvm-4 in my local runtime to see if this bug can be reproduced.
   
   cc @merrymercy @comaniac @tqchen @masahi 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-tvm] jcf94 commented on pull request #6830: [WIP][AutoScheduler] Bug fix for layout rewrite CI error in i386

Posted by GitBox <gi...@apache.org>.
jcf94 commented on pull request #6830:
URL: https://github.com/apache/incubator-tvm/pull/6830#issuecomment-721464517


   # Problem
   
   Compute get wrong result after layout rewrite, and this only occurs in i386 CI.
   
   # Current Status
   
   After trying many different tests, I guess I have finally found the reason.
   
   Only i386 CI used llvm-4 to build the TVM.
   
   i386 CI with llvm-4: https://ci.tlcpack.ai/blue/rest/organizations/jenkins/pipelines/tvm/branches/PR-6830/runs/4/nodes/253/steps/331/log/?start=0
   
   i386 CI with llvm-8: https://ci.tlcpack.ai/blue/rest/organizations/jenkins/pipelines/tvm/branches/PR-6830/runs/7/nodes/253/steps/331/log/?start=0


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-tvm] jcf94 edited a comment on pull request #6830: [WIP][AutoScheduler] Bug fix for layout rewrite CI error in i386

Posted by GitBox <gi...@apache.org>.
jcf94 edited a comment on pull request #6830:
URL: https://github.com/apache/incubator-tvm/pull/6830#issuecomment-721464517


   # Problem
   
   Compute get wrong result after layout rewrite, and this only occurs in i386 CI.
   
   # Current Status
   
   After trying many different tests, I guess I have finally found the reason. Only i386 CI used llvm-4 to build the TVM.
   
   This test has set the random seed to a fixed number that AutoScheduler can generate a same schedule.
   i386 CI with llvm-4: https://ci.tlcpack.ai/blue/rest/organizations/jenkins/pipelines/tvm/branches/PR-6830/runs/4/nodes/253/steps/331/log/?start=0
   
   Same schedule in i386 CI with llvm-8: https://ci.tlcpack.ai/blue/rest/organizations/jenkins/pipelines/tvm/branches/PR-6830/runs/7/nodes/253/steps/331/log/?start=0
   
   The lowered result of TVM is exactly the same, so I think the only cause may be some special bug during llvm codegen in llvm-4.
   
   To fully confirm it, we may need to compare their llvm ir. I'm trying llvm-4 in my local runtime to see if this bug can be reproduced.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-tvm] jcf94 commented on pull request #6830: [WIP][AutoScheduler] Bug fix for layout rewrite CI error in i386

Posted by GitBox <gi...@apache.org>.
jcf94 commented on pull request #6830:
URL: https://github.com/apache/incubator-tvm/pull/6830#issuecomment-721582119


   This problem can be reproduced in my local runtime with ci-i386 docker.
   
   I've tried more tests on different llvm versions, codegen results with higher llvm version can still encounter accuracy problem, but with lower possiblity.
   
   Seems the float point operations under 32bit environment trends to be less accurate than 64bit? Currently a better way to fix this may still be setting a bigger atol and rtol value.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-tvm] tqchen commented on pull request #6830: [AutoScheduler] Bug fix for layout rewrite CI error in i386

Posted by GitBox <gi...@apache.org>.
tqchen commented on pull request #6830:
URL: https://github.com/apache/incubator-tvm/pull/6830#issuecomment-721757013


   Thanks @jcf94 for timely fix and indepth analysis


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org