You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@mxnet.apache.org by GitBox <gi...@apache.org> on 2019/12/06 19:36:09 UTC

[GitHub] [incubator-mxnet] larroy commented on issue #16995: CI timeout on unix-cpu Python2 test

larroy commented on issue #16995: CI timeout on unix-cpu Python2 test
URL: https://github.com/apache/incubator-mxnet/issues/16995#issuecomment-562708883
 
 
   I diagnosed some of the timeouts to come from EFS rate limit. EFS shared ccache is currently disabled in the master CI for this reason. I don't like shared state but having a shared EFS ccache might indeed provide some value which should be measured, but the size was too huge and too many small files overload EFS IO.  In this case, from what I see the test itself is taking 4h, and the cache is disabled.  
   
   I would suggest to check the test output http://jenkins.mxnet-ci.amazon-ml.com/blue/rest/organizations/jenkins/pipelines/mxnet-validation/pipelines/unix-cpu/branches/PR-16986/runs/2/nodes/294/steps/685/log/?start=0  run locally on the same type of instance and compare the test times to see the durations of each test, say sort two columns in excel and see if there's some test that is getting stuck and slowing down the full suite.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services