You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@mxnet.apache.org by GitBox <gi...@apache.org> on 2019/02/02 09:25:29 UTC

[GitHub] arcadiaphy opened a new issue #14057: validation stucks when training gluoncv ssd model

arcadiaphy opened a new issue #14057: validation stucks when training gluoncv ssd model
URL: https://github.com/apache/incubator-mxnet/issues/14057
 
 
   ## Description
   When training gluoncv ssd model, validation sometimes takes way more longer time than the training epoch. After debugging, the problem comes from the `box_nms` operator which contributes most of the time.
   
   ## Environment info (Required)
   
   ```
       Centos 7
       CUDA: 9.0
       cudnn: 7 
       mxnet: 1.4.0.rc2
       gluon-cv: latest
   
   ```
   
   ## Minimum reproducible example
   The following snippets show `box_nms` will take very long time when processing a lot of prior boxes
   ```
   import mxnet as mx
   import numpy as np
   
   np.random.seed(0)
   
   batch_size = 32
   prior_number = 100000
   data = np.zeros((batch_size, prior_number, 6))
   data[:, :, 0] = np.random.randint(-1, 1, (batch_size, prior_number))
   data[:, :, 1] = np.random.random((batch_size, prior_number))
   
   xmin = np.random.random((batch_size, prior_number))
   ymin = np.random.random((batch_size, prior_number))
   width = np.random.random((batch_size, prior_number))
   height = np.random.random((batch_size, prior_number))
   data[:, :, 2] = xmin
   data[:, :, 3] = ymin
   data[:, :, 4] = xmin + width
   data[:, :, 5] = ymin + height
   
   mx_data = mx.nd.array(data, ctx=mx.gpu(0))
   rv = mx.nd.contrib.box_nms(mx_data, overlap_thresh=0.5, valid_thresh=0.01, topk=400, score_index=1, id_index=0)
   mx.nd.waitall()
   
   ```
   
   ## What I have found out
   1. The gpu version of stable sort in `SortByKey` function degrades badly on sorting length
   2. The `box_nms` operator doesn't remove background boxes in valid box filtering which leads to big sorting length
   
   ## What I have done
   1. Add SORT_WITH_THRUST compiling definition in Makefile: the validation process is still very slow
   2. Add background boxes filtering in `box_nms`: the validation process accelerates dramatically since most of boxes are classified as background.
   
   I will post a PR on the second solution.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services