You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@mxnet.apache.org by GitBox <gi...@apache.org> on 2021/11/12 04:25:18 UTC

[GitHub] [incubator-mxnet] DickJC123 opened a new issue #20738: oneapi build issue "hash sum mismatch" is affecting multiple PR's

DickJC123 opened a new issue #20738:
URL: https://github.com/apache/incubator-mxnet/issues/20738


   ## Description
   Here are two independent PR's with the failure:
   https://jenkins.mxnet-ci.amazon-ml.com/blue/organizations/jenkins/mxnet-validation%2Funix-gpu/detail/PR-20635/38/pipeline
   https://jenkins.mxnet-ci.amazon-ml.com/blue/organizations/jenkins/mxnet-validation%2Funix-gpu/detail/PR-20734/5/pipeline
   
   The failure has been reported as an issue with the mirrors supplying oneapi: https://community.intel.com/t5/Registration-Download-Licensing/OneAPI-apt-repository-broken/m-p/1329104
   
   I'm a little suspicious there might be more to it based on 2 observations:
   
   1. The onednn lib is installed by a RUN command in Dockerfile.build.ubuntu.  This creates an intermediate docker image that is pulled in from cache in the failing builds:
   ```
   [2021-11-11T23:00:22.939Z] Step 5/20 : RUN export DEBIAN_FRONTEND=noninteractive ...
   [2021-11-11T23:00:23.196Z]  ---> Using cache
   [2021-11-11T23:00:23.196Z]  ---> 1a09ef0af63e
   ```
   The image tag is the same as we've seen for a week or more, well before apparent changes to the mirrors.  So are we not handling cached docker images properly?
   
   2. The actual error is in a `apt-get update` performed by a later RUN command that is installing tensor-rt and cudnn.  Perhaps the intel repo used to install onednn in the earlier RUN command should be removed from the container in that same step, since the installation is complete?  It's possible that the command `add-apt-repository -r "deb https://apt.repos.intel.com/oneapi all main"` would perform that action.  If the intel repo were no longer in /etc/apt/sources.list, presumably the currently failing `apt-get update` would succeed.
   
   ### Error Message
   ```
   [2021-11-11T23:00:39.105Z] Err:9 https://apt.repos.intel.com/oneapi all/main all Packages
   [2021-11-11T23:00:39.105Z]   Hash Sum mismatch
   [2021-11-11T23:00:39.105Z]   Hashes of expected file:
   [2021-11-11T23:00:39.105Z]    - Filesize:21072 [weak]
   [2021-11-11T23:00:39.105Z]    - SHA512:7082767f95f6e40ad31deb8a9df205fa726ef3f4821ff6982d507f2f91adb57c282d1fbe3253f610b3e07f77a0c3c2320ed2c78b8d4b5b648928dd5c1fea271e
   [2021-11-11T23:00:39.105Z]    - SHA256:7e91d4ace2815407f999e88e5296f678447b9577e1f84af4addc7212c8eb32b0
   [2021-11-11T23:00:39.105Z]    - SHA1:53e523680f4f09015f82673434772a6ec112e8f2 [weak]
   [2021-11-11T23:00:39.105Z]    - MD5Sum:3f125fa13d509dd4e66fa49ae3d5af96 [weak]
   [2021-11-11T23:00:39.105Z]   Hashes of received file:
   [2021-11-11T23:00:39.105Z]    - SHA512:5af0e2266d2ef7cfd42b907c68d21b020e8e1f6c516e9fb35c7affcd52d047ffedec885f14685eaf6539edfc23c0da8e9c7035bcede483a331d9c66e5dce8c54
   [2021-11-11T23:00:39.105Z]    - SHA256:97bb376982553d6f5ae07c29a79fd653295caf7599cd6deb3c051c90a0290af1
   [2021-11-11T23:00:39.105Z]    - SHA1:9e1ac9d3f961d4e376cbc55758a334cc158a9603 [weak]
   [2021-11-11T23:00:39.105Z]    - MD5Sum:db23233f3ef8572c745ff537a2b2fdb8 [weak]
   [2021-11-11T23:00:39.105Z]    - Filesize:21072 [weak]
   [2021-11-11T23:00:39.105Z]   Last modification reported: Tue, 05 Oct 2021 04:38:36 +0000
   ```
   
   ## To Reproduce
   Have not repro'd outside of CI runs.
   
   ### Steps to reproduce
   
   ## What have you tried to solve it?
   
   I was not able to repro the failure using the recipe posted to the intel site, i.e. it worked fine for me.
   ## Environment
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@mxnet.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@mxnet.apache.org
For additional commands, e-mail: issues-help@mxnet.apache.org


[GitHub] [incubator-mxnet] TaoLv commented on issue #20738: oneapi build issue "hash sum mismatch" is affecting multiple PR's

Posted by GitBox <gi...@apache.org>.
TaoLv commented on issue #20738:
URL: https://github.com/apache/incubator-mxnet/issues/20738#issuecomment-967158403


   Thank you for reporting the issue, @DickJC123. Actually the apt source was added to install MKL BLAS library, rather than oneDNN.
   
   Hi @yinghu5 @jingxu10, do you know who is managing the oneAPI apt repository? Thanks.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@mxnet.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@mxnet.apache.org
For additional commands, e-mail: issues-help@mxnet.apache.org


[GitHub] [incubator-mxnet] DickJC123 edited a comment on issue #20738: oneapi build issue "hash sum mismatch" is affecting multiple PR's

Posted by GitBox <gi...@apache.org>.
DickJC123 edited a comment on issue #20738:
URL: https://github.com/apache/incubator-mxnet/issues/20738#issuecomment-969216313


   It's possible the problem has gone away, since I've been able to get some clean CI runs on a side debug-PR I created (https://github.com/apache/incubator-mxnet/pull/20739).  The only improvement from that work I would suggest is the following line:
   
   https://github.com/apache/incubator-mxnet/blob/705e3d87564a11308ec37c7d0ce07244e14f409c/ci/docker/Dockerfile.build.ubuntu#L103
   
   If there are some mirrors serving up the wrong files, the symptom is that every time one does an `apt-get update` with the repo in the apt repo list, there's a possibility of hitting a bad mirror server and getting the hash-mismatch error.  Thus, I recommend that the docker RUN command that adds the oneapi repo, then installs some packages, should then remove the repo from the apt repo list (seen as "step 5" in the log).  That way, when another docker RUN also does an `apt-get update` (e.g. "step 20" to install tensorrt and cudnn), it won't needlessly reach out to oneapi mirrors.
   
   FYI @josephevans .  If you add the line to a PR of yours, you will have to tweek the use of '&&' and '\\'  in the prior line.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@mxnet.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@mxnet.apache.org
For additional commands, e-mail: issues-help@mxnet.apache.org


[GitHub] [incubator-mxnet] akarbown commented on issue #20738: oneapi build issue "hash sum mismatch" is affecting multiple PR's

Posted by GitBox <gi...@apache.org>.
akarbown commented on issue #20738:
URL: https://github.com/apache/incubator-mxnet/issues/20738#issuecomment-975466620


   
   
   
   
   > Thank you for reporting the issue, @DickJC123. Actually the apt source was added to install MKL BLAS library, rather than oneDNN.
   > 
   > Hi @yinghu5 @jingxu10, do you know who is managing the oneAPI apt repository? Thanks.
   
   I've seen the same behavior when testing the last changes connected with  Ubuntu dockerfile with updated oneMKL. Then, it was explained as 'Mirror sync in progress' and cleaning cache 'apt-get clean' was going to help. I've requested jira internally for that and can reopen it and add you (@TaoLv, @yinghu5, @jingxu10) to the task.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@mxnet.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@mxnet.apache.org
For additional commands, e-mail: issues-help@mxnet.apache.org


[GitHub] [incubator-mxnet] DickJC123 commented on issue #20738: oneapi build issue "hash sum mismatch" is affecting multiple PR's

Posted by GitBox <gi...@apache.org>.
DickJC123 commented on issue #20738:
URL: https://github.com/apache/incubator-mxnet/issues/20738#issuecomment-966912744


   I created a PR to experiment with possible fixes to this problem.  An edit to the docker "step 5" RUN line that installs oneapi forced the execution of that command, and the log shows a similar hash mismatch error: https://jenkins.mxnet-ci.amazon-ml.com/blue/organizations/jenkins/mxnet-validation%2Fsanity/detail/PR-20739/2/pipeline/39/
   
   I conclude it's probably a problem with the public mirror of oneapi.  Thoughts @TaoLv?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@mxnet.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@mxnet.apache.org
For additional commands, e-mail: issues-help@mxnet.apache.org


[GitHub] [incubator-mxnet] DickJC123 edited a comment on issue #20738: oneapi build issue "hash sum mismatch" is affecting multiple PR's

Posted by GitBox <gi...@apache.org>.
DickJC123 edited a comment on issue #20738:
URL: https://github.com/apache/incubator-mxnet/issues/20738#issuecomment-969216313


   It's possible the problem has gone away, since I've been able to get some clean CI runs on a side debug-PR I created (https://github.com/apache/incubator-mxnet/pull/20739).  The only improvement from that work I would suggest is the following line:
   
   https://github.com/apache/incubator-mxnet/blob/705e3d87564a11308ec37c7d0ce07244e14f409c/ci/docker/Dockerfile.build.ubuntu#L103
   
   If there are some mirrors serving up the wrong files, the symptom is that every time one does an `apt-get update` with the repo in the apt repo list, there's a possibility of hitting a bad mirror server and getting the hash-mismatch error.  Thus, I recommend that the docker RUN command that adds the oneapi repo, then installs some packages, should then remove the repo from the apt repo list (seen as "step 5" in the log).  That way, when another docker RUN also does an `apt-get update` (e.g. "step 20" to install tensorrt and cudnn), it won't needlessly reach out to oneapi mirrors.
   
   FYI @josephevans .  If you add the line to a PR of yours, you may have to tweek the use of '&&' and '\\' .


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@mxnet.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@mxnet.apache.org
For additional commands, e-mail: issues-help@mxnet.apache.org


[GitHub] [incubator-mxnet] DickJC123 edited a comment on issue #20738: oneapi build issue "hash sum mismatch" is affecting multiple PR's

Posted by GitBox <gi...@apache.org>.
DickJC123 edited a comment on issue #20738:
URL: https://github.com/apache/incubator-mxnet/issues/20738#issuecomment-969216313


   It's possible the problem has gone away, since I've been able to get some clean CI runs on a side debug-PR I created (https://github.com/apache/incubator-mxnet/pull/20739).  The only improvement from that work I would suggest is the following line:
   
   https://github.com/apache/incubator-mxnet/blob/705e3d87564a11308ec37c7d0ce07244e14f409c/ci/docker/Dockerfile.build.ubuntu#L103
   
   If there are some mirrors serving up the wrong files, the symptom is that every time one does an `apt-get update` with the repo in the apt repo list, there's a possibility of hitting a bad mirror server and getting the hash-mismatch error.  Thus, I recommend that the docker RUN command that adds the oneapi repo, then installs some packages, should then remove the repo from the apt repo list (seen as "step 5" in the log).  That way, when another docker RUN also does an `apt-get update` (e.g. "step 20" to install tensorrt and cudnn), it won't needlessly reach out to oneapi mirrors.
   
   FYI @josephevans .  If you add the line, you may have to tweek the use of '&&' and '\\' .


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@mxnet.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@mxnet.apache.org
For additional commands, e-mail: issues-help@mxnet.apache.org


[GitHub] [incubator-mxnet] DickJC123 commented on issue #20738: oneapi build issue "hash sum mismatch" is affecting multiple PR's

Posted by GitBox <gi...@apache.org>.
DickJC123 commented on issue #20738:
URL: https://github.com/apache/incubator-mxnet/issues/20738#issuecomment-969216313


   It's possible the problem has gone away, since I've been able to get some clean CI runs on a side debug-PR I created (https://github.com/apache/incubator-mxnet/pull/20739).  The only improvement from that work I would suggest is the following line:
   
   https://github.com/apache/incubator-mxnet/blob/705e3d87564a11308ec37c7d0ce07244e14f409c/ci/docker/Dockerfile.build.ubuntu#L103
   
   If there are some mirrors serving up the wrong files, the symptom is that every time one does an `apt-get update` with the repo in the apt repo list, there's a possibility of hitting a bad mirror server and getting the hash-mismatch error.  Thus, I recommend that the docker RUN command that adds the oneapi repo, then installs some packages, should then remove the repo from the apt repo list (seen as "step 5" in the log).  That way, when another docker RUN also does an `apt-get update` (e.g. "step 20" to install tensorrt and cudnn), it won't needlessly reach out to oneapi mirrors.
   
   FYI @josephevans .  If you add the line, you may have to tweek the use of '&&' and '\' .


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@mxnet.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@mxnet.apache.org
For additional commands, e-mail: issues-help@mxnet.apache.org