You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@mxnet.apache.org by GitBox <gi...@apache.org> on 2020/10/18 08:12:23 UTC

[GitHub] [incubator-mxnet] Neutron3529 opened a new issue #19373: what's wrong with PR test?

Neutron3529 opened a new issue #19373:
URL: https://github.com/apache/incubator-mxnet/issues/19373


   currently, many PR failed due to bad network connection.
   not only me myself, but also many PR is affected.
   ```
   [2020-10-17T00:02:07.148Z] 2020-10-17 00:02:05,525 - root - ERROR - ('Connection aborted.', ConnectionResetError(104, 'Connection reset by peer'))
   
   [2020-10-17T00:02:07.148Z] Traceback (most recent call last):
   
   [2020-10-17T00:02:07.148Z]   File "/usr/lib/python3/dist-packages/urllib3/connectionpool.py", line 601, in urlopen
   
   [2020-10-17T00:02:07.148Z]     chunked=chunked)
   
   [2020-10-17T00:02:07.148Z]   File "/usr/lib/python3/dist-packages/urllib3/connectionpool.py", line 387, in _make_request
   
   [2020-10-17T00:02:07.148Z]     six.raise_from(e, None)
   
   [2020-10-17T00:02:07.148Z]   File "<string>", line 3, in raise_from
   
   [2020-10-17T00:02:07.148Z]   File "/usr/lib/python3/dist-packages/urllib3/connectionpool.py", line 383, in _make_request
   
   [2020-10-17T00:02:07.148Z]     httplib_response = conn.getresponse()
   
   [2020-10-17T00:02:07.148Z]   File "/usr/lib/python3.6/http/client.py", line 1346, in getresponse
   
   [2020-10-17T00:02:07.148Z]     response.begin()
   
   [2020-10-17T00:02:07.148Z]   File "/usr/lib/python3.6/http/client.py", line 307, in begin
   
   [2020-10-17T00:02:07.148Z]     version, status, reason = self._read_status()
   
   [2020-10-17T00:02:07.148Z]   File "/usr/lib/python3.6/http/client.py", line 268, in _read_status
   
   [2020-10-17T00:02:07.148Z]     line = str(self.fp.readline(_MAXLINE + 1), "iso-8859-1")
   
   [2020-10-17T00:02:07.148Z]   File "/usr/lib/python3.6/socket.py", line 586, in readinto
   
   [2020-10-17T00:02:07.148Z]     return self._sock.recv_into(b)
   
   [2020-10-17T00:02:07.148Z] ConnectionResetError: [Errno 104] Connection reset by peer
   
   ```
   such exception could not resolved by any PR.
   I re-run test for at least 5 times, and most of the test failed.
   
   what's wrong?
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@mxnet.apache.org
For additional commands, e-mail: issues-help@mxnet.apache.org


[GitHub] [incubator-mxnet] josephevans edited a comment on issue #19373: what's wrong with PR test?

Posted by GitBox <gi...@apache.org>.
josephevans edited a comment on issue #19373:
URL: https://github.com/apache/incubator-mxnet/issues/19373#issuecomment-712425848


   I think there's two issues today. First, I found an expired GPG key that was preventing R packages from being installed on Ubuntu 16.04. I created PR #19377 for this. Waiting for my PR to pass to backport.
   
   Second, I see errors trying to uncompress the Packages file from https://developer.download.nvidia.com/compute/cuda/repos/ubuntu1604/x86_64/
   
   It looks like a new file was recently pushed, so maybe this has been fixed by nvidia. 
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@mxnet.apache.org
For additional commands, e-mail: issues-help@mxnet.apache.org


[GitHub] [incubator-mxnet] Neutron3529 closed issue #19373: what's wrong with PR test?

Posted by GitBox <gi...@apache.org>.
Neutron3529 closed issue #19373:
URL: https://github.com/apache/incubator-mxnet/issues/19373


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@mxnet.apache.org
For additional commands, e-mail: issues-help@mxnet.apache.org


[GitHub] [incubator-mxnet] josephevans commented on issue #19373: what's wrong with PR test?

Posted by GitBox <gi...@apache.org>.
josephevans commented on issue #19373:
URL: https://github.com/apache/incubator-mxnet/issues/19373#issuecomment-713215470


   CI seems much more stable today with the new AMI. Released an updated AMI to address ARMv8 test failures due to qemu installation. We should no longer be seeing the docker connection issues (or unexpected EOF errors) and 2 of 3 PRs to fix the other CI issue (expired GPG key) has been merged.
   
   This issue can be closed.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@mxnet.apache.org
For additional commands, e-mail: issues-help@mxnet.apache.org


[GitHub] [incubator-mxnet] Neutron3529 commented on issue #19373: what's wrong with PR test?

Posted by GitBox <gi...@apache.org>.
Neutron3529 commented on issue #19373:
URL: https://github.com/apache/incubator-mxnet/issues/19373#issuecomment-713515860


   Seems the CI works now, close this issue.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@mxnet.apache.org
For additional commands, e-mail: issues-help@mxnet.apache.org


[GitHub] [incubator-mxnet] josephevans commented on issue #19373: what's wrong with PR test?

Posted by GitBox <gi...@apache.org>.
josephevans commented on issue #19373:
URL: https://github.com/apache/incubator-mxnet/issues/19373#issuecomment-712425848


   I think there's two issues today. First, I found an expired GPG key that was preventing R packages from being installed on Ubuntu 16.04. I create a PR-19377 for this. Waiting for my PR to pass to backport.
   
   Second, I see errors trying to uncompress the Packages file from https://developer.download.nvidia.com/compute/cuda/repos/ubuntu1604/x86_64/
   
   It looks like a new file was recently pushed, so maybe this has been fixed by nvidia. 
   
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@mxnet.apache.org
For additional commands, e-mail: issues-help@mxnet.apache.org


[GitHub] [incubator-mxnet] leezu commented on issue #19373: what's wrong with PR test?

Posted by GitBox <gi...@apache.org>.
leezu commented on issue #19373:
URL: https://github.com/apache/incubator-mxnet/issues/19373#issuecomment-712268292


   It's a CI issue and need to be fixed. It's unrelated to your PR.
   
   cc @sandeep-krishnamurthy @josephevans 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@mxnet.apache.org
For additional commands, e-mail: issues-help@mxnet.apache.org


[GitHub] [incubator-mxnet] josephevans edited a comment on issue #19373: what's wrong with PR test?

Posted by GitBox <gi...@apache.org>.
josephevans edited a comment on issue #19373:
URL: https://github.com/apache/incubator-mxnet/issues/19373#issuecomment-712425848


   I think there's two issues today. First, I found an expired GPG key that was preventing R packages from being installed on Ubuntu 16.04. I create PR #19377 for this. Waiting for my PR to pass to backport.
   
   Second, I see errors trying to uncompress the Packages file from https://developer.download.nvidia.com/compute/cuda/repos/ubuntu1604/x86_64/
   
   It looks like a new file was recently pushed, so maybe this has been fixed by nvidia. 
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@mxnet.apache.org
For additional commands, e-mail: issues-help@mxnet.apache.org


[GitHub] [incubator-mxnet] leezu commented on issue #19373: what's wrong with PR test?

Posted by GitBox <gi...@apache.org>.
leezu commented on issue #19373:
URL: https://github.com/apache/incubator-mxnet/issues/19373#issuecomment-712478319


   The nvidia issue may affect more than the 16.04 mentioned above. https://github.com/NVIDIA/nvidia-docker/issues/1402 contains some more info (though the issue was closed as it isn't directed to the correct owner)


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@mxnet.apache.org
For additional commands, e-mail: issues-help@mxnet.apache.org


[GitHub] [incubator-mxnet] sandeep-krishnamurthy commented on issue #19373: what's wrong with PR test?

Posted by GitBox <gi...@apache.org>.
sandeep-krishnamurthy commented on issue #19373:
URL: https://github.com/apache/incubator-mxnet/issues/19373#issuecomment-713220292


   Thank you so much Joe.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@mxnet.apache.org
For additional commands, e-mail: issues-help@mxnet.apache.org


[GitHub] [incubator-mxnet] josephevans commented on issue #19373: what's wrong with PR test?

Posted by GitBox <gi...@apache.org>.
josephevans commented on issue #19373:
URL: https://github.com/apache/incubator-mxnet/issues/19373#issuecomment-712623241


   Ok, I believe I finally found the culprit. Our AMIs that are used for Jenkins slaves have auto-update turned on, and based on the logfiles of the slave instances, it looks like docker was being auto-updated and restarted, which was killing the log-output of the containers (and therefore jenkins jobs.)
   
   I've created a new AMI for mxnetlinux_cpu hosts with updated software versions, which also adds an option to the docker config to hopefully prevent this in the future.  See https://docs.docker.com/config/containers/live-restore/  - Thanks @leezu for the recommendation.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@mxnet.apache.org
For additional commands, e-mail: issues-help@mxnet.apache.org


[GitHub] [incubator-mxnet] leezu commented on issue #19373: what's wrong with PR test?

Posted by GitBox <gi...@apache.org>.
leezu commented on issue #19373:
URL: https://github.com/apache/incubator-mxnet/issues/19373#issuecomment-712440878


   @josephevans thank you for looking into the issue. But please note that the issues you mention are unrelated as they only affect the v1.x branch, whereas the issue described here affects the master branch.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@mxnet.apache.org
For additional commands, e-mail: issues-help@mxnet.apache.org