You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@mxnet.apache.org by GitBox <gi...@apache.org> on 2019/03/29 18:10:18 UTC
[GitHub] [incubator-mxnet] apeforest opened a new issue #14569: performance
degradation in model inference from 1.3.1 to 1.4.0
apeforest opened a new issue #14569: performance degradation in model inference from 1.3.1 to 1.4.0
URL: https://github.com/apache/incubator-mxnet/issues/14569
There seems to be a regression on resnet-18 model inference time (When running on GPU) post this PR, this was caught in MMS. nightly runs, the changes in this PR seem to be causing this issue.
**Setup**
We use MMS docker images to run load tests, we can start a local container using the following command.
```bash
nvidia-docker run --name mms_benchmark_gpu -p 8080:8080 -p 8081:8081 -itd awsdeeplearningteam/mxnet-model-server:nightly-mxnet-gpu
```
for building MXNet opencv 3.2 and CUDA 9.2 were used.
Load testing was done using locust, to install locust
```bash
pip install locust
```
# Download Test image
```bash
curl -O https://s3.amazonaws.com/model-server/inputs/kitten.jpg
```
The locust script for load testing
```python
# test_resnet_!8.py
from locust import HttpLocust, TaskSet, task, TaskSequence, seq_task
import urllib
import os
data = None
with open(os.path.join(os.getcwd(),'kitten.jpg'), 'rb') as data:
data = data.read()
class PredictionTasks(TaskSet):
@task
def inference(self):
self.client.post("/predictions/resnet-18", data=data,headers={'Content-Type': 'image/jpeg'})
class Prediction(HttpLocust):
task_set = PredictionTasks
min_wait = 100
max_wait = 100
```
**Running Load test**
Registering and loading model
```bash
# Register and load resnet-18 model archive
curl -X POST 127.0.0.1:8081/models?url=https://s3.amazonaws.com/model-server/model_archive_1.0/resnet-18.mar
```
Start a single worker and run latency test
```bash
Start worker and latency test
$ curl -X PUT 'http://127.0.0.1:8081/models/resnet-18?min_worker=1&synchronous=true'
$ locust -f test_resnet_18.py Prediction --host=http://127.0.0.1:8080 --no-web -c 1 -r 1 -t 20s --only-summary
```
**To change mxnet version/build in docker image,**
**NOTE** By default recent pip version is pulled.
```bash
# Go into docker image
nvidia-docker exec -u root -it mms_benchmark_gpu bash
$ pip uninstall mxnet-cu92mkl
$ pip install <new-build>.whl
ctrl + p + q to quit docker image
# Destroy existing worker, and create new worker, this loads in newly installed mxnet
$ curl -X PUT 'http://127.0.0.1:8081/models/resnet-18?min_worker=0&synchronous=true'
$ curl -X PUT 'http://127.0.0.1:8081/models/resnet-18?min_worker=1&synchronous=true'
```
**Results**
on mxnet-cu92==1.3.0post0
```bash
# locust result
Name # reqs # fails Avg Min Max | Median req/s
--------------------------------------------------------------------------------------------------------------------------------------------
POST /predictions/resnet-18 152 0(0.00%) 31 30 39 | 31 7.60
--------------------------------------------------------------------------------------------------------------------------------------------
Total 152 0(0.00%) 7.60
Percentage of the requests completed within given times
Name # reqs 50% 66% 75% 80% 90% 95% 98% 99% 100%
--------------------------------------------------------------------------------------------------------------------------------------------
POST /predictions/resnet-18 152 31 31 31 31 32 33 33 34 280
--------------------------------------------------------------------------------------------------------------------------------------------
Total 152 31 31 31 31 32 33 33 34 280
```
On mxnet-cu92 with commit https://github.com/apache/incubator-mxnet/commit/f9f74169bb05f85d85dec5991aa5fc9050dec9f6
```bash
Name # reqs # fails Avg Min Max | Median req/s
--------------------------------------------------------------------------------------------------------------------------------------------
POST /predictions/resnet-18 141 0(0.00%) 41 37 337 | 38 7.20
--------------------------------------------------------------------------------------------------------------------------------------------
Total 141 0(0.00%) 7.20
Percentage of the requests completed within given times
Name # reqs 50% 66% 75% 80% 90% 95% 98% 99% 100%
--------------------------------------------------------------------------------------------------------------------------------------------
POST /predictions/resnet-18 141 38 39 39 40 40 42 49 49 340
--------------------------------------------------------------------------------------------------------------------------------------------
Total 141 38 39 39 40 40 42 49 49 340
```
This regression thus carries over to 1.3.1
There is a 30% increase in latency/inference time for resnet-18 based on the above results.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
With regards,
Apache Git Services