You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@tvm.apache.org by GitBox <gi...@apache.org> on 2021/12/11 01:54:10 UTC

[GitHub] [tvm] masahi opened a new issue #9713: [CI Image] Update PyTorch to v1.10 in GPU image

masahi opened a new issue #9713:
URL: https://github.com/apache/tvm/issues/9713


   Thanks for participating in the TVM community! We use https://discuss.tvm.ai for any general usage questions and discussions. The issue tracker is used for actionable items such as feature proposals discussion, roadmaps, and bug tracking.  You are always welcomed to post on the forum first :smile_cat:
   
   Issues that are inactive for a period of time may get closed. We adopt this policy so that we won't lose track of actionable issues that may fall at the bottom of the pile. Feel free to reopen a new one if you feel there is an additional problem that needs attention when an old one gets closed.
   
   - [ ] S0. Reason: For example, a blocked PR or a feature issue
   
   - [ ] S1. Tag of nightly build: TAG. Docker hub: https://hub.docker.com/layers/tlcpackstaging/ci_cpu/...
   
   - [ ] S2. The nightly is built on TVM commit: TVM_COMMIT. Detailed info can be found here: https://ci.tlcpack.ai/blue/organizations/jenkins/docker-images-ci%2Fdaily-docker-image-rebuild/detail/daily-docker-image-rebuild/....
   
   - [ ] S3. Testing the nightly image on ci-docker-staging: https://ci.tlcpack.ai/blue/organizations/jenkins/tvm/detail/ci-docker-staging/...
   
   - [ ] S4. Retag TAG to VERSION:
   ```
   docker pull tlcpackstaging/IMAGE_NAME:TAG
   docker tag tlcpackstaging/IMAGE_NAME:TAG tlcpack/IMAGE_NAME:VERSION
   docker push tlcpack/IMAGE_NAME:VERSION
   ```
   
   - [ ] S5. Check if the new tag is really there: https://hub.docker.com/u/tlcpack
   
   - [ ] S6. Submit a PR updating the IMAGE_NAME version on Jenkins
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@tvm.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [tvm] masahi commented on issue #9713: [CI Image] Update PyTorch to v1.10 in GPU image

Posted by GitBox <gi...@apache.org>.
masahi commented on issue #9713:
URL: https://github.com/apache/tvm/issues/9713#issuecomment-992330893


   @manupa-arm Thanks, yes I'm aware of the recent CI outrage. Probably I'll work on this next week.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@tvm.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [tvm] leandron commented on issue #9713: [CI Image] Update PyTorch to v1.10 in GPU image

Posted by GitBox <gi...@apache.org>.
leandron commented on issue #9713:
URL: https://github.com/apache/tvm/issues/9713#issuecomment-997010261


   I can deal with this request in #9762.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@tvm.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [tvm] leandron commented on issue #9713: [CI Image] Update PyTorch to v1.10 in GPU image

Posted by GitBox <gi...@apache.org>.
leandron commented on issue #9713:
URL: https://github.com/apache/tvm/issues/9713#issuecomment-997010418


   #9762 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@tvm.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [tvm] masahi edited a comment on issue #9713: [CI Image] Update PyTorch to v1.10 in GPU image

Posted by GitBox <gi...@apache.org>.
masahi edited a comment on issue #9713:
URL: https://github.com/apache/tvm/issues/9713#issuecomment-998503657


   Interesting! I guess I don't even need to build a new image locally anymore... So should I just send a PR to update `docker/install/ubuntu_install_onnx.sh` to cause a new nightly image with the new PyTorch version to be built?
   
   I have more CI questions I want to ask. Recently I've joined discord to learn about CI issues. Can we continue there?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@tvm.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [tvm] manupa-arm commented on issue #9713: [CI Image] Update PyTorch to v1.10 in GPU image

Posted by GitBox <gi...@apache.org>.
manupa-arm commented on issue #9713:
URL: https://github.com/apache/tvm/issues/9713#issuecomment-998488235


   @masahi you should not be needing to push images there. The ci-docker-build nighly job will automatically build images from a nightly checkout of main for all images. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@tvm.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [tvm] masahi closed issue #9713: [CI Image] Update PyTorch to v1.10 in GPU image

Posted by GitBox <gi...@apache.org>.
masahi closed issue #9713:
URL: https://github.com/apache/tvm/issues/9713


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@tvm.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [tvm] jiangjiajun commented on issue #9713: [CI Image] Update PyTorch to v1.10 in GPU image

Posted by GitBox <gi...@apache.org>.
jiangjiajun commented on issue #9713:
URL: https://github.com/apache/tvm/issues/9713#issuecomment-1002361102


   @masahi I'm not sure why this error occurred. 
   
   I have found there was a  core dumped problem while we running `autotvm` with  paddle frontend, and the reason is the  system signal capturing of PaddlePaddle. 
   
   To solve the problem, we have released a new version of 2.1.3, use the new function `paddle.disable_signal_handler()` to disable signal capturing. 
   
   This function is called in paddle frontend and tvmc frontend, we have tested in TVM, it works with no problem
   https://github.com/apache/tvm/blob/main/python/tvm/relay/frontend/paddlepaddle.py#L2270
   https://github.com/apache/tvm/blob/main/python/tvm/driver/tvmc/frontends.py#L279
   
   I checked all the codes in tvm where we have imported paddle, and found the test code didn't call `paddle.disable_signal_handler()`, so I send a new PR https://github.com/apache/tvm/pull/9809 , could you test this modification with PyTorch v1.10?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@tvm.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [tvm] masahi commented on issue #9713: [CI Image] Update PyTorch to v1.10 in GPU image

Posted by GitBox <gi...@apache.org>.
masahi commented on issue #9713:
URL: https://github.com/apache/tvm/issues/9713#issuecomment-1001741958


   @jiangjiajun I'm testing a new image with PT 1.10 and getting an error from paddle: https://ci.tlcpack.ai/blue/organizations/jenkins/tvm/detail/ci-docker-staging/182/pipeline/
   
   Any idea what's going on? I can run `from_paddle.py` locally without issues. `free(): invalid pointer` thing looks similar to a known issue with recent PT + LLVM discussed in https://github.com/apache/tvm/issues/9362


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@tvm.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [tvm] jiangjiajun commented on issue #9713: [CI Image] Update PyTorch to v1.10 in GPU image

Posted by GitBox <gi...@apache.org>.
jiangjiajun commented on issue #9713:
URL: https://github.com/apache/tvm/issues/9713#issuecomment-1001835259


   It looks strange, I'll check this today


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@tvm.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [tvm] manupa-arm edited a comment on issue #9713: [CI Image] Update PyTorch to v1.10 in GPU image

Posted by GitBox <gi...@apache.org>.
manupa-arm edited a comment on issue #9713:
URL: https://github.com/apache/tvm/issues/9713#issuecomment-992320714


   Hi @masahi ,
   
   FYI, I'd be careful with S3.
   
   I was trying to use the image from S2 in S3 as mentioned here : https://github.com/apache/tvm/issues/9659
   Unfortunately, it got a timeout and leaving the node with a image. It is only then I realize we our jobs dont start with clean CI nodes in terms of images. Now the node does not have enough memory.
   
   Now the ci_cpu upgrade is currently blocked on this and we dont have access to any of the nodes. cc : @leandron 
   https://github.com/apache/tvm/issues/9705.
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@tvm.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [tvm] masahi closed issue #9713: [CI Image] Update PyTorch to v1.10 in GPU image

Posted by GitBox <gi...@apache.org>.
masahi closed issue #9713:
URL: https://github.com/apache/tvm/issues/9713


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@tvm.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [tvm] manupa-arm edited a comment on issue #9713: [CI Image] Update PyTorch to v1.10 in GPU image

Posted by GitBox <gi...@apache.org>.
manupa-arm edited a comment on issue #9713:
URL: https://github.com/apache/tvm/issues/9713#issuecomment-998488235


   @masahi you should not be needing to push images there. The ci-docker-build nighly job will automatically build images from a nightly checkout of main for all images, and push it there. The last number indicates the commit hash used when building the image


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@tvm.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [tvm] jiangjiajun commented on issue #9713: [CI Image] Update PyTorch to v1.10 in GPU image

Posted by GitBox <gi...@apache.org>.
jiangjiajun commented on issue #9713:
URL: https://github.com/apache/tvm/issues/9713#issuecomment-1003405177


   Paddle does not depend on PyTorch or LibTorch.
   
   Did you solve the problem now, there's error in this link https://ci.tlcpack.ai/blue/organizations/jenkins/tvm/detail/ci-docker-staging/185/pipeline/267/


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@tvm.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [tvm] masahi commented on issue #9713: [CI Image] Update PyTorch to v1.10 in GPU image

Posted by GitBox <gi...@apache.org>.
masahi commented on issue #9713:
URL: https://github.com/apache/tvm/issues/9713#issuecomment-1003335331


   @jiangjiajun I was able to reproduce the error under the new gpu container.
   
   Does paddle use PyTorch internally, or link against `libtorch`? 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@tvm.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [tvm] manupa-arm commented on issue #9713: [CI Image] Update PyTorch to v1.10 in GPU image

Posted by GitBox <gi...@apache.org>.
manupa-arm commented on issue #9713:
URL: https://github.com/apache/tvm/issues/9713#issuecomment-998509502


   Yes, that should do it.
   
   S3 should capture any issues should the updated image breaks the current tests anyway.
   
   As a next step, we want to explore the possibility to making every PR rebuild the images using tlcpackstaging images as a cache. i.e. we don't sacrifice correctness in the verification but the PRs that involves docker changes will be a bit slow due to the invalidation of the cache -- however, if we could push another tlcpackstaging image then it should be go back to normal behavior of using the cache. Something to discuss in the next meetup : @leandron @areusch 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@tvm.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [tvm] jiangjiajun commented on issue #9713: [CI Image] Update PyTorch to v1.10 in GPU image

Posted by GitBox <gi...@apache.org>.
jiangjiajun commented on issue #9713:
URL: https://github.com/apache/tvm/issues/9713#issuecomment-1002902275


   Okay, I will test on my evironment


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@tvm.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [tvm] masahi commented on issue #9713: [CI Image] Update PyTorch to v1.10 in GPU image

Posted by GitBox <gi...@apache.org>.
masahi commented on issue #9713:
URL: https://github.com/apache/tvm/issues/9713#issuecomment-997066748


   @leandron This is for updating the GPU image, so I think it should be independent.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@tvm.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [tvm] masahi commented on issue #9713: [CI Image] Update PyTorch to v1.10 in GPU image

Posted by GitBox <gi...@apache.org>.
masahi commented on issue #9713:
URL: https://github.com/apache/tvm/issues/9713#issuecomment-998503657


   Interesting! I guess I don't even need to build a new image locally anymore... So should I just send a PR to update `docker/install/ubuntu_install_onnx.sh` to cause a new nightly image with the new PyTorch version?
   
   I have more CI questions I want to ask. Recently I've joined discord to learn about CI issues. Can we continue there?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@tvm.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [tvm] masahi commented on issue #9713: [CI Image] Update PyTorch to v1.10 in GPU image

Posted by GitBox <gi...@apache.org>.
masahi commented on issue #9713:
URL: https://github.com/apache/tvm/issues/9713#issuecomment-1002815525


   It's not clear what script caused the failure. The error message seems to indicate that the issue happens during `make html` (tutorial scripts), and the only clue is
   ```
   Extension error (sphinx_gallery.docs_resolv):
   
   Handler <function embed_code_links at 0x7f60e1b15f28> for event 'build-finished' threw an exception (exception: list indices must be integers or slices, not str)
   
   free(): invalid pointer
   
   --------------------------------------
   
   C++ Traceback (most recent call last):
   
   --------------------------------------
   
   0   paddle::framework::SignalHandle(char const*, int)
   1   paddle::platform::GetCurrentTraceBackString[abi:cxx11]()
   ```
   
   I thought the error was coming from `from_paddle.py`, but I couldn't reproduce it in a non-docker environment. I'll try running this script under our gpu container. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@tvm.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [tvm] jiangjiajun commented on issue #9713: [CI Image] Update PyTorch to v1.10 in GPU image

Posted by GitBox <gi...@apache.org>.
jiangjiajun commented on issue #9713:
URL: https://github.com/apache/tvm/issues/9713#issuecomment-1002665248


   Do you know how to reproduce the problem, or which script I should run


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@tvm.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [tvm] manupa-arm commented on issue #9713: [CI Image] Update PyTorch to v1.10 in GPU image

Posted by GitBox <gi...@apache.org>.
manupa-arm commented on issue #9713:
URL: https://github.com/apache/tvm/issues/9713#issuecomment-992320714


   Hi @masahi ,
   
   FYI, I'd be careful with S3.
   
   I was trying to use the image from S2 in S3 as mentioned here : https://github.com/apache/tvm/issues/9659
   Unfortunately, it got a timeout and leaving the node with a image. It is only then I realize we our jobs dont start with clean CI nodes in terms of images.
   
   Now the ci_cpu upgrade is currently blocked on this and we dont have access to any of the nodes. cc : @leandron 
   https://github.com/apache/tvm/issues/9705.
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@tvm.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [tvm] manupa-arm edited a comment on issue #9713: [CI Image] Update PyTorch to v1.10 in GPU image

Posted by GitBox <gi...@apache.org>.
manupa-arm edited a comment on issue #9713:
URL: https://github.com/apache/tvm/issues/9713#issuecomment-992320714


   Hi @masahi ,
   
   FYI, I'd be careful with S3.
   
   I was trying to use the image from S2 in S3 as mentioned here : https://github.com/apache/tvm/issues/9659
   Unfortunately, it got a timeout and leaving the node with a image. It is only then I realize our jobs dont start with clean CI nodes in terms of images. Now the node does not have enough memory.
   
   Now the ci_cpu upgrade is currently blocked on this and we dont have access to any of the nodes. cc : @leandron 
   https://github.com/apache/tvm/issues/9705.
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@tvm.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [tvm] leandron closed issue #9713: [CI Image] Update PyTorch to v1.10 in GPU image

Posted by GitBox <gi...@apache.org>.
leandron closed issue #9713:
URL: https://github.com/apache/tvm/issues/9713


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@tvm.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [tvm] masahi commented on issue #9713: [CI Image] Update PyTorch to v1.10 in GPU image

Posted by GitBox <gi...@apache.org>.
masahi commented on issue #9713:
URL: https://github.com/apache/tvm/issues/9713#issuecomment-998352447


   @manupa-arm @leandron @areusch I'm not familiar with the new CI image update protocol. How does one push an image to `tlcpackstaging`? I've tried
   
   ```
   docker push tlcpackstaging/ci_gpu:20211219-100400-c0d326dbd
   ```
   
   But got 
   ```
   denied: requested access to the resource is denied
   ```
   
   Previously I would just create an image like ci_gpu:v0.76 and push it to tlcpack dockerhub org


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@tvm.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [tvm] masahi commented on issue #9713: [CI Image] Update PyTorch to v1.10 in GPU image

Posted by GitBox <gi...@apache.org>.
masahi commented on issue #9713:
URL: https://github.com/apache/tvm/issues/9713#issuecomment-1002564207


   @jiangjiajun Unfortunately it didn't help https://ci.tlcpack.ai/blue/organizations/jenkins/tvm/detail/ci-docker-staging/184/pipeline
   
   Were you able to reproduce the issue locally?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@tvm.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [tvm] masahi commented on issue #9713: [CI Image] Update PyTorch to v1.10 in GPU image

Posted by GitBox <gi...@apache.org>.
masahi commented on issue #9713:
URL: https://github.com/apache/tvm/issues/9713#issuecomment-1003368109


   @jiangjiajun ok it turned out the error has nothing to do with paddle: After applying the mitigation for PyTorch + LLVM symbol conflict issue https://github.com/apache/tvm/issues/9362#issuecomment-955263494, there is no longer `free(): invalid pointer
   ` and backtrace from paddle. https://ci.tlcpack.ai/blue/organizations/jenkins/tvm/detail/ci-docker-staging/185/pipeline/267


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@tvm.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org