You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@mxnet.apache.org by GitBox <gi...@apache.org> on 2019/03/29 18:10:18 UTC
[GitHub] [incubator-mxnet] apeforest opened a new issue #14569: performance degradation in model inference from 1.3.1 to 1.4.0

apeforest opened a new issue #14569: performance degradation in model inference from 1.3.1 to 1.4.0
URL: https://github.com/apache/incubator-mxnet/issues/14569
 
 
   There seems to be a regression on resnet-18 model inference time (When running on GPU) post this PR, this was caught in MMS. nightly runs, the changes in this PR seem to be causing this issue.
   
   **Setup**
   
   We use MMS docker images to run load tests, we can start a local container using the following command.
   
   ```bash
   nvidia-docker run --name mms_benchmark_gpu -p 8080:8080 -p 8081:8081  -itd awsdeeplearningteam/mxnet-model-server:nightly-mxnet-gpu
   ```
   
   for building MXNet opencv 3.2 and CUDA 9.2 were used. 
   
   Load testing was done using locust, to install locust
   ```bash
   pip install locust
   ```
   
   # Download Test image
   ```bash
   curl -O https://s3.amazonaws.com/model-server/inputs/kitten.jpg
   ```
   The locust script for load testing
   ```python
   # test_resnet_!8.py
   from locust import HttpLocust, TaskSet, task, TaskSequence, seq_task
   import urllib
   import os
   data = None
   with open(os.path.join(os.getcwd(),'kitten.jpg'), 'rb') as data:
       data = data.read()
   
   class PredictionTasks(TaskSet):
       @task
       def inference(self):
           self.client.post("/predictions/resnet-18", data=data,headers={'Content-Type': 'image/jpeg'})
   
   class Prediction(HttpLocust):
       task_set = PredictionTasks
       min_wait = 100
       max_wait = 100
   ```
   
   **Running Load test**
   
   Registering and loading model
   
   ```bash
   # Register and load resnet-18 model archive
    curl -X POST 127.0.0.1:8081/models?url=https://s3.amazonaws.com/model-server/model_archive_1.0/resnet-18.mar
   ```
   
   Start a single worker and run latency test
   
   ```bash
   Start worker and latency test
   $ curl -X PUT 'http://127.0.0.1:8081/models/resnet-18?min_worker=1&synchronous=true'
   $ locust -f test_resnet_18.py Prediction --host=http://127.0.0.1:8080 --no-web  -c 1 -r 1 -t 20s --only-summary
   ```
   
   **To change mxnet version/build in docker image,** 
   
   **NOTE** By default recent pip version is pulled.
   
   ```bash
   # Go into docker image
   nvidia-docker exec -u root -it mms_benchmark_gpu bash
   $  pip uninstall mxnet-cu92mkl
   $ pip install <new-build>.whl
   ctrl + p + q to quit docker image
   
   # Destroy existing worker, and create new worker, this loads in newly installed mxnet
   $ curl -X PUT 'http://127.0.0.1:8081/models/resnet-18?min_worker=0&synchronous=true'
   $ curl -X PUT 'http://127.0.0.1:8081/models/resnet-18?min_worker=1&synchronous=true'
   ```
   
   **Results**
   
   on mxnet-cu92==1.3.0post0
   
   ```bash
   # locust result
    Name                                                          # reqs      # fails     Avg     Min     Max  |  Median   req/s
   --------------------------------------------------------------------------------------------------------------------------------------------
    POST /predictions/resnet-18                                      152     0(0.00%)      31      30      39  |      31    7.60
   --------------------------------------------------------------------------------------------------------------------------------------------
    Total                                                            152     0(0.00%)                                       7.60
   
   Percentage of the requests completed within given times
    Name                                                           # reqs    50%    66%    75%    80%    90%    95%    98%    99%   100%
   --------------------------------------------------------------------------------------------------------------------------------------------
    POST /predictions/resnet-18                                       152     31     31     31     31     32     33     33     34     280
   --------------------------------------------------------------------------------------------------------------------------------------------
    Total                                                             152     31     31     31     31     32     33     33     34     280
   ```
   On mxnet-cu92 with commit https://github.com/apache/incubator-mxnet/commit/f9f74169bb05f85d85dec5991aa5fc9050dec9f6 
   
   ```bash
    Name                                                          # reqs      # fails     Avg     Min     Max  |  Median   req/s
   --------------------------------------------------------------------------------------------------------------------------------------------
    POST /predictions/resnet-18                                      141     0(0.00%)      41      37     337  |      38    7.20
   --------------------------------------------------------------------------------------------------------------------------------------------
    Total                                                            141     0(0.00%)                                       7.20
   
   Percentage of the requests completed within given times
    Name                                                           # reqs    50%    66%    75%    80%    90%    95%    98%    99%   100%
   --------------------------------------------------------------------------------------------------------------------------------------------
    POST /predictions/resnet-18                                       141     38     39     39     40     40     42     49     49    340
   --------------------------------------------------------------------------------------------------------------------------------------------
    Total                                                             141     38     39     39     40     40     42     49     49    340
   ```
   
   This regression thus carries over to 1.3.1
   
   There is a 30% increase in latency/inference time for resnet-18 based on the above results.
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services