You are viewing a plain text version of this content. The canonical link for it is here.

Posted to commits@tvm.apache.org by GitBox <gi...@apache.org> on 2020/12/20 10:11:29 UTC

[GitHub] [tvm] masahi opened a new pull request #7137: [Torch] Fix PyTorch NMS conversion for negative scores

masahi opened a new pull request #7137:
URL: https://github.com/apache/tvm/pull/7137


   Thanks for contributing to TVM!   Please refer to guideline https://tvm.apache.org/docs/contribute/ for useful information and tips. After the pull request is submitted, please request code reviews from [Reviewers](https://github.com/apache/incubator-tvm/blob/master/CONTRIBUTORS.md#reviewers) by @ them in the pull request thread.
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [tvm] masahi edited a comment on pull request #7137: [Torch] Fix PyTorch NMS conversion for negative scores

Posted by GitBox <gi...@apache.org>.

masahi edited a comment on pull request #7137:
URL: https://github.com/apache/tvm/pull/7137#issuecomment-750709698


   With the fix in this applied and using `rpn_pre_nms_top_n_test` = 200, on GTX 1070 ti
   
   Torch GPU: 0.093 sec
   TVM GPU: 0.39 sec (no tuning or cudnn)
   
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [tvm] masahi edited a comment on pull request #7137: [Torch] Fix PyTorch NMS conversion for negative scores

Posted by GitBox <gi...@apache.org>.

masahi edited a comment on pull request #7137:
URL: https://github.com/apache/tvm/pull/7137#issuecomment-750769271


   Also I find that gluoncv MaskRCNN mostly follows the design of PyTorch MaskRCNN. But they apply sigmoid to objectness network outputs, so in their case there are no negative scores and all inputs to RPN NMS, which could be thousand of boxes, are valid, even if "by valid" we mean the previous wrong definition of having positive score. 
   
   So without this fix, if you compare MaskRCNN performance of PyTorch and GluonCV after we compile them to TVM, PyTorch model would run much faster because our cheat would work to ignore negative boxes, while there is no negative boxes in GluonCV MaskRCNN. So equivalent models give inconsistent result after they get compiled to TVM. This is non sense.
   
   So we have to accept the fact that we need to deal with lots of boxes in MaskRCNN.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [tvm] masahi edited a comment on pull request #7137: [Torch] Fix PyTorch NMS conversion for negative scores

Posted by GitBox <gi...@apache.org>.

masahi edited a comment on pull request #7137:
URL: https://github.com/apache/tvm/pull/7137#issuecomment-751300057


   > One interesting thing about output result is: for pytorch 1.7, we can exact match the results of tvm vs pt with this change, but for pytorch 1.4 there is still mismatch which won't affect final accuracy
   
   Interesting, I didn't test on 1.4. If even the number of output box are different, then it must be a NMS issue, since NMS is the only thing that could change the number of box. I'm also fine if we can get the same output as the latest pytorch.
   
   Thanks for validating my fix. Luckily, I also found a way to parallelize the inner loop of GPU NMS which should give a massive speedup. The change is this one https://github.com/masahi/tvm/commit/c75c6ef00b1e89d6a6698aeb8048be6e6f13c0ff but since I'm away from my GPU at the moment, I haven't tested yet. It also reduces the number of IOU tests from O(N ** 2) to O (# selected boxes * N). 
   
   Hopefully next week I can send a PR with good perf improvement on GPU NMS and hence PyTorch MaskRCNN performance on GPU.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [tvm] masahi edited a comment on pull request #7137: [Torch] Fix PyTorch NMS conversion for negative scores

Posted by GitBox <gi...@apache.org>.

masahi edited a comment on pull request #7137:
URL: https://github.com/apache/tvm/pull/7137#issuecomment-750694208


   @zhiics @kevinthesun @yongwww 
   
   Great news!! I've just checked that with this fix, we can now get the same number of output boxes from pytorch and TVM, which enables meaningful comparison of two outputs. We can do `assert_equal` with high precision on bbox coordinates and scores. Class ids are identical. Tested on both LLVM and CUDA target.
   
    See the updated `test_object_detection.py`. I also added an example of setting `rpn_pre_nms_top_n_test` explicitly.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [tvm] masahi edited a comment on pull request #7137: [Torch] Fix PyTorch NMS conversion for negative scores

Posted by GitBox <gi...@apache.org>.

masahi edited a comment on pull request #7137:
URL: https://github.com/apache/tvm/pull/7137#issuecomment-750709698


   With the fix in this PR applied and using `rpn_pre_nms_top_n_test` = 200, on GTX 1070 ti
   
   Torch GPU: 0.093 sec
   TVM GPU: 0.39 sec (no tuning or cudnn)
   
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [tvm] masahi edited a comment on pull request #7137: [Torch] Fix PyTorch NMS conversion for negative scores

Posted by GitBox <gi...@apache.org>.

masahi edited a comment on pull request #7137:
URL: https://github.com/apache/tvm/pull/7137#issuecomment-750694208


   @zhiics @kevinthesun @yongwww 
   
   Great news!! I've just checked that with this fix, we can now get the same number of output boxes from pytorch and TVM, which enables meaningful comparison of two outputs. We can do `assert_equal` with high precision on bbox coordinates and scores. Class ids are identical. Tested on both LLVM and CUDA target.
   
    See the updated `test_object_detection.py`. I also added an example of setting `rpn_post_nms_top_n_test` explicitly.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [tvm] masahi edited a comment on pull request #7137: [Torch] Fix PyTorch NMS conversion for negative scores

Posted by GitBox <gi...@apache.org>.

masahi edited a comment on pull request #7137:
URL: https://github.com/apache/tvm/pull/7137#issuecomment-750960526

ok I ran CPU perf comparison with old code vs new code + default param. As expected, there is almost no perf regression.

```
Old code: 2.1364135856199997 sec
After the fix in this PR: 2.1756499899999997 sec
```
The old code spends 5 milliseconds on NMS. Here is the stat from profile.
https://github.com/masahi/torchscript-to-tvm/blob/master/maskrcnn/maskrcnn_cpu_vm_profile_old.txt#L51

The new code gets far more boxes, but NMS spends only 20 milliseconds on it.
https://github.com/masahi/torchscript-to-tvm/blob/master/maskrcnn/maskrcnn_cpu_vm_profile_1000.txt#L24

The reason the new code doesn't make CPU perf worse is simple. Even though now NMS gets more boxes, PyTorch detection models does post NMS topk after RPN, see https://github.com/pytorch/vision/blob/90645ccd0e774ad76200245e32222a23d09f2312/torchvision/models/detection/rpn.py#L261-L263

So no matter how many boxes NMS gets, the number of boxes after NMS + post NMS topk doesn't change and bounded by `rpn_pre_nms_top_n_test` parameter whose default is 1000. The CPU perf is always dominated by the large dense layer.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [tvm] masahi edited a comment on pull request #7137: [Torch] Fix PyTorch NMS conversion for negative scores

Posted by GitBox <gi...@apache.org>.

masahi edited a comment on pull request #7137:
URL: https://github.com/apache/tvm/pull/7137#issuecomment-750438126

@kevinthesun @zhiics I asked torchvision people about boxes with negative scores in https://github.com/pytorch/vision/issues/3198 and now I fully understand the issue. See in particular this great answer https://github.com/pytorch/vision/issues/3198#issuecomment-750346278

My conclusion is that TVM's conversion rule for PyTorch NMS is definitely wrong and needs fixing. Here is my take away from the above discussion:

* PyTorch detection model has two use of NMS - one in `ROIHead` and another in `RegionProposalNetwork`.
* NMS scores in `ROIHead` are probability. There, they do score thresholding with user-chosen threshold before NMS. This NMS doesn't send boxes with negative scores to TVM.
* NMS scores in `RegionProposalNetwork` correspond to "objectness" and they don't apply softmax or sigmoid to the output from objectness network. The scores are estimate of [logit function](https://en.wikipedia.org/wiki/Logit) and negative logit totally makes sense - it just mean probability is < 0.5. So a totally reasonable box can end up having a negative score. We **shouldn't** arbitrarily cut negative boxes from RPN.

It's highly possible that one of the reasons you didn't get the same output from pytorch detection models after compiling to TVM is this incorrect assumption we've been making about negative boxes, because RPN output in PyTorch/TVM are totally different - we are only considering only about half of them.

The good news is, I found a way to reduce the number of boxes that are sent to NMS. See these parameters https://github.com/pytorch/vision/blob/master/torchvision/models/detection/mask_rcnn.py#L68-L71. They are something like `topk` parameter for each classes separately. The default is 1000, and there are five classes/levels in RPN NMS. So that explain why we are getting about 4500 boxes to RPN NMS. If we set the `rpn_post_nms_top_n_test` to 200, we will get at most 1000 boxes, 200 boxes for each level in the feature pyramid. That will significantly make detection model go faster and still consider a lot of boxes to keep accuracy high.

So please take a look at the above issue in torchvision and my comment carefully and let's merge this ASAP.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [tvm] masahi commented on pull request #7137: [Torch] Fix PyTorch NMS conversion for negative scores

Posted by GitBox <gi...@apache.org>.

masahi commented on pull request #7137:
URL: https://github.com/apache/tvm/pull/7137#issuecomment-750766311


   > How about the perf numbers with default value of `rpn_pre_nms_top_n_test`?
   
   It is dominated by NMS which takes 2.1 seconds with default `rpn_pre_nms_top_n_test`. 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [tvm] masahi commented on a change in pull request #7137: [Torch] Fix PyTorch NMS conversion for negative scores

Posted by GitBox <gi...@apache.org>.

masahi commented on a change in pull request #7137:
URL: https://github.com/apache/tvm/pull/7137#discussion_r548408319



##########
File path: tests/python/frontend/pytorch/test_object_detection.py
##########
@@ -70,7 +71,7 @@ def generate_jit_model(index):
     ]
 
     model_func = model_funcs[index]
-    model = TraceWrapper(model_func(pretrained=True))
+    model = TraceWrapper(model_func(pretrained=True, rpn_pre_nms_top_n_test=200))

Review comment:
       I think the default parameter 1000 they picked is fairly conservative. This means for each level in the feature pyramid, of which there is 5 if we use resnet 50 backbones, we get maximum of 1000 x 5 boxes as input to RPN. The have another parameter `rpn_post_nms_top_n_test`, which is like topk applied after NMS. This value is also by default 1000 and it is not per class unlike `rpn_pre_nms_top_n_test`. This means we always have 1000 boxes after NMS regardless of  `rpn_pre_nms_top_n_test`.
   




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [tvm] zhiics edited a comment on pull request #7137: [Torch] Fix PyTorch NMS conversion for negative scores

Posted by GitBox <gi...@apache.org>.

zhiics edited a comment on pull request #7137:
URL: https://github.com/apache/tvm/pull/7137#issuecomment-750702471


   BTW, could you please run the Mask R-CNN benchmark to see if there is any big performance difference? There is a tutorial for it.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [tvm] masahi edited a comment on pull request #7137: [Torch] Fix PyTorch NMS conversion for negative scores

Posted by GitBox <gi...@apache.org>.

masahi edited a comment on pull request #7137:
URL: https://github.com/apache/tvm/pull/7137#issuecomment-750733915


   ok I can provide NMS performance comparison as a one data point. When I was investigating GPU NMS performance issue, the old code was taking 630 milliseconds while with this fix it was 2.1 seconds. But again, that's because the new code is dealing with far more boxes and our GPU NMS code is currently extremely slow due to the sequential loop.
   
   According to the numbers I posted in https://github.com/apache/tvm/pull/7154, on CPU NMS is fast: the old code was spending only 8 milliseconds. So I don't expect NMS on CPU would be a big issue.
    
   After NMS, PyTorch detection model does post-NMS topk, which selects 1000 boxes for later processing. So the perf difference should only be in NMS. 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [tvm] masahi commented on pull request #7137: [Torch] Fix PyTorch NMS conversion for negative scores

Posted by GitBox <gi...@apache.org>.

masahi commented on pull request #7137:
URL: https://github.com/apache/tvm/pull/7137#issuecomment-750709698


   With the fix in this applied and using `rpn_pre_nms_top_n_test` = 200,
   
   Torch GPU: 0.093 sec
   TVM GPU: 0.39 sec (no tuning or cudnn)
   
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [tvm] masahi commented on a change in pull request #7137: [Torch] Fix PyTorch NMS conversion for negative scores

Posted by GitBox <gi...@apache.org>.

masahi commented on a change in pull request #7137:
URL: https://github.com/apache/tvm/pull/7137#discussion_r548162082



##########
File path: python/tvm/relay/frontend/pytorch.py
##########
@@ -1857,16 +1857,18 @@ def nms(self, inputs, input_types):
         scores = inputs[1]
         iou_threshold = inputs[2]
 
+        num_boxes = _op.shape_of(scores)
+
+        # TVM NMS assumes score > 0
+        scores = scores - _op.min(scores) + _op.const(1.0)
         # Generate data with shape (1, num_anchors, 5)
         scores = AttrCvt(op_name="expand_dims", extras={"axis": -1, "num_newaxis": 1})([scores], {})
-
-        # Prepare input data for get_valid_counts
         data = _op.concatenate([scores, boxes], -1)
         data = _op.expand_dims(data, 0, 1)
-        # Leverage get_valid_counts to sort the data and clear invalid boxes
-        ct, data, indices = get_relay_op("get_valid_counts")(
-            data, score_threshold=-1.0, id_index=-1, score_index=0
-        )
+        # PyTorch NMS doesn't have score_threshold, so no need to run get_valid_count

Review comment:
       In PyTorch, negative scores don't necessarily mean it is invalid. See my comment in https://github.com/apache/tvm/pull/7137#issuecomment-750438126




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [tvm] masahi edited a comment on pull request #7137: [Torch] Fix PyTorch NMS conversion for negative scores

Posted by GitBox <gi...@apache.org>.

masahi edited a comment on pull request #7137:
URL: https://github.com/apache/tvm/pull/7137#issuecomment-750704235


   @zhiics What before and after you want to know? With this change, we need to set `rpn_pre_nms_top_n_test` to lower values to get reasonable performance. 
   
   I don't think comparison to old code is meaningful because it is not doing the right thing. The old code is surely faster, but that's because we were cheating.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [tvm] masahi commented on pull request #7137: [Torch] Fix PyTorch NMS conversion for negative scores

Posted by GitBox <gi...@apache.org>.

masahi commented on pull request #7137:
URL: https://github.com/apache/tvm/pull/7137#issuecomment-750733915


   ok just as a one data point, when I was investigating GPU NMS performance issue, the old code was taking 630 milliseconds while with this fix it was 2.1 seconds. But again, that's because the new code is dealing with far more boxes.
   
   According to the numbers I posted in https://github.com/apache/tvm/pull/7154, on CPU NMS is fast: the old code was spending only 8 milliseconds. So I don't expect NMS on CPU would be a big issue.
    
   After NMS, PyTorch detection model does post-NMS topk, which selects 1000 boxes for later processing. So the perf difference should only be in NMS. 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [tvm] masahi commented on pull request #7137: [Torch] Fix PyTorch NMS conversion for negative scores

Posted by GitBox <gi...@apache.org>.

masahi commented on pull request #7137:
URL: https://github.com/apache/tvm/pull/7137#issuecomment-750704235


   @zhiics What before and after you want to know? With this change, we need to set `rpn_pre_nms_top_n_test` to lower values to get reasonable performance. 
   
   I don't think comparison to old code is meaningful because it is not doing the right thing. If we find that the old code is faster, that's because we were cheating.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [tvm] masahi commented on pull request #7137: [Torch] Fix PyTorch NMS conversion for negative scores

Posted by GitBox <gi...@apache.org>.

masahi commented on pull request #7137:
URL: https://github.com/apache/tvm/pull/7137#issuecomment-750960526

ok I ran CPU perf comparison with old code vs new code + default param. As expected, there is almost no perf regression.

The new code gets far more boxes, but NMS spends only 20 milliseconds on it.
https://github.com/masahi/torchscript-to-tvm/blob/master/maskrcnn/maskrcnn_cpu_vm_profile_1000.txt#L24

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [tvm] kevinthesun commented on a change in pull request #7137: [Torch] Fix PyTorch NMS conversion for negative scores

Posted by GitBox <gi...@apache.org>.

kevinthesun commented on a change in pull request #7137:
URL: https://github.com/apache/tvm/pull/7137#discussion_r546923370



##########
File path: python/tvm/relay/frontend/pytorch.py
##########
@@ -1857,16 +1857,18 @@ def nms(self, inputs, input_types):
         scores = inputs[1]
         iou_threshold = inputs[2]
 
+        num_boxes = _op.shape_of(scores)
+
+        # TVM NMS assumes score > 0
+        scores = scores - _op.min(scores) + _op.const(1.0)
         # Generate data with shape (1, num_anchors, 5)
         scores = AttrCvt(op_name="expand_dims", extras={"axis": -1, "num_newaxis": 1})([scores], {})
-
-        # Prepare input data for get_valid_counts
         data = _op.concatenate([scores, boxes], -1)
         data = _op.expand_dims(data, 0, 1)
-        # Leverage get_valid_counts to sort the data and clear invalid boxes
-        ct, data, indices = get_relay_op("get_valid_counts")(
-            data, score_threshold=-1.0, id_index=-1, score_index=0
-        )
+        # PyTorch NMS doesn't have score_threshold, so no need to run get_valid_count

Review comment:
       torchvision nms doesn't filter out invalid boxes before nms, which can be super slow. Filtering out negative score boxes should have not affect to results. Probably we can discuss more about what we can get from this change.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [tvm] masahi edited a comment on pull request #7137: [Torch] Fix PyTorch NMS conversion for negative scores

Posted by GitBox <gi...@apache.org>.

masahi edited a comment on pull request #7137:
URL: https://github.com/apache/tvm/pull/7137#issuecomment-750694208


   @zhiics @kevinthesun @yongwww 
   
   Great news!! I've just checked that with this fix, we can now get the same number of output boxes from pytorch and TVM, which enables meaningful comparison of two outputs. We can do `assert_equal` with high precision on bbox coordinates and scores. Class ids are identical. Tested on both LLVM and CUDA target.
   
    See the updated `test_object_detection.py`.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [tvm] masahi edited a comment on pull request #7137: [Torch] Fix PyTorch NMS conversion for negative scores

Posted by GitBox <gi...@apache.org>.

masahi edited a comment on pull request #7137:
URL: https://github.com/apache/tvm/pull/7137#issuecomment-750694208


   @zhiics @kevinthesun @yongwww 
   
   Great news!! I've just checked that with this fix, we can now get the same number of output boxes from pytorch and TVM, which enables meaningful comparison of two outputs. We can do `assert_equal` with high precision on bbox coordinates and scores. Class ids are identical. Tested on both LLVM and CUDA target.
   
    See the updated `test_object_detection.py`. I also added an example of setting `rpn_pre_nms_top_n_test` explicitly. Compared to the default value of 1000, using 200 the runtime of GPU NMS drops from 2 seconds to  90 milliseconds. 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [tvm] masahi commented on a change in pull request #7137: [Torch] Fix PyTorch NMS conversion for negative scores

Posted by GitBox <gi...@apache.org>.

masahi commented on a change in pull request #7137:
URL: https://github.com/apache/tvm/pull/7137#discussion_r548408319



##########
File path: tests/python/frontend/pytorch/test_object_detection.py
##########
@@ -70,7 +71,7 @@ def generate_jit_model(index):
     ]
 
     model_func = model_funcs[index]
-    model = TraceWrapper(model_func(pretrained=True))
+    model = TraceWrapper(model_func(pretrained=True, rpn_pre_nms_top_n_test=200))

Review comment:
       I think the default parameter 1000 they picked is fairly conservative. This means for each level in the feature pyramid, of which there is 5 if we use resnet 50 backbone, we get maximum of 1000 x 5 boxes as input to RPN. They have another parameter `rpn_post_nms_top_n_test`, which is like topk applied after NMS. This value is also by default 1000 and it is not per class unlike `rpn_pre_nms_top_n_test`. This means we always have 1000 boxes after NMS regardless of  `rpn_pre_nms_top_n_test`.
   




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [tvm] masahi edited a comment on pull request #7137: [Torch] Fix PyTorch NMS conversion for negative scores

Posted by GitBox <gi...@apache.org>.

masahi edited a comment on pull request #7137:
URL: https://github.com/apache/tvm/pull/7137#issuecomment-750438126

My conclusion is that TVM's conversion rule for PyTorch NMS is definitely wrong and needs fixing. Here is my take away from the above discussion:

It's highly possible that one of the reasons you didn't get the same output from pytorch detection models after compiling to TVM is due to this incorrect assumption we've been making about negative boxes, because RPN output in PyTorch/TVM are totally different - we are only considering only about half of them.

The good news is, I found a way to reduce the number of boxes that are sent to NMS. See these parameters https://github.com/pytorch/vision/blob/master/torchvision/models/detection/mask_rcnn.py#L68-L71. They are something like `topk` parameter for each classes separately. The default is 1000, and there are five classes/levels in RPN NMS. So that explain why we are getting about 4500 boxes to RPN NMS. If we set the `rpn_pre_nms_top_n_train` to 200, we will get at most 1000 boxes, 200 boxes for each level in the feature pyramid. That will significantly make detection model go faster and still consider a lot of boxes to keep accuracy high.

So please take a look at the above issue in torchvision and my comment carefully and let's go ahead and merge this.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [tvm] yongwww commented on a change in pull request #7137: [Torch] Fix PyTorch NMS conversion for negative scores

Posted by GitBox <gi...@apache.org>.

yongwww commented on a change in pull request #7137:
URL: https://github.com/apache/tvm/pull/7137#discussion_r548404381



##########
File path: tests/python/frontend/pytorch/test_object_detection.py
##########
@@ -70,7 +71,7 @@ def generate_jit_model(index):
     ]
 
     model_func = model_funcs[index]
-    model = TraceWrapper(model_func(pretrained=True))
+    model = TraceWrapper(model_func(pretrained=True, rpn_pre_nms_top_n_test=200))

Review comment:
       Glad to see `rpn_pre_nms_top_n_test ` is able to limit the proposals before nms. I am not sure if the parameter is specified for real use cases, seems using default of the parameter to do benchamarking makes more sense to me

##########
File path: python/tvm/relay/frontend/pytorch.py
##########
@@ -1857,16 +1857,18 @@ def nms(self, inputs, input_types):
         scores = inputs[1]
         iou_threshold = inputs[2]
 
+        num_boxes = _op.shape_of(scores)
+
+        # TVM NMS assumes score > 0
+        scores = scores - _op.min(scores) + _op.const(1.0)
         # Generate data with shape (1, num_anchors, 5)
         scores = AttrCvt(op_name="expand_dims", extras={"axis": -1, "num_newaxis": 1})([scores], {})
-
-        # Prepare input data for get_valid_counts
         data = _op.concatenate([scores, boxes], -1)
         data = _op.expand_dims(data, 0, 1)
-        # Leverage get_valid_counts to sort the data and clear invalid boxes
-        ct, data, indices = get_relay_op("get_valid_counts")(
-            data, score_threshold=-1.0, id_index=-1, score_index=0
-        )
+        # PyTorch NMS doesn't have score_threshold, so no need to run get_valid_count

Review comment:
       then in this case, the nms will compute for all boxes, the perf should be bad. Not sure how PyTorch speeds up the computation for boxes with negative score.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [tvm] masahi edited a comment on pull request #7137: [Torch] Fix PyTorch NMS conversion for negative scores

Posted by GitBox <gi...@apache.org>.

masahi edited a comment on pull request #7137:
URL: https://github.com/apache/tvm/pull/7137#issuecomment-750694208


   @zhiics @kevinthesun @yongwww Great news!! I've just checked that with this fix, we can now get the same number of output boxes from pytorch and TVM, which enables meaningful comparison of two outputs. We can do assert_equal with high precision on bbox coordinates and scores. Class ids are identical. Tested on both LLVM and CUDA target. See the updated `test_object_detection.py`.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [tvm] yongwww commented on pull request #7137: [Torch] Fix PyTorch NMS conversion for negative scores

Posted by GitBox <gi...@apache.org>.

yongwww commented on pull request #7137:
URL: https://github.com/apache/tvm/pull/7137#issuecomment-750765455


   > With the fix in this PR applied and using `rpn_pre_nms_top_n_test` = 200, on GTX 1070 ti
   > 
   > Torch GPU: 0.093 sec
   > TVM GPU: 0.39 sec (no tuning or cudnn)
   
   How about the perf numbers with default value of `rpn_pre_nms_top_n_test`?
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [tvm] zhiics commented on pull request #7137: [Torch] Fix PyTorch NMS conversion for negative scores

Posted by GitBox <gi...@apache.org>.

zhiics commented on pull request #7137:
URL: https://github.com/apache/tvm/pull/7137#issuecomment-751302259


   Thanks @masahi @kevinthesun @yongwww 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [tvm] masahi edited a comment on pull request #7137: [Torch] Fix PyTorch NMS conversion for negative scores

Posted by GitBox <gi...@apache.org>.

masahi edited a comment on pull request #7137:
URL: https://github.com/apache/tvm/pull/7137#issuecomment-750766311


   > How about the perf numbers with default value of `rpn_pre_nms_top_n_test`?
   
   It is dominated by NMS which takes 2.1 seconds with default `rpn_pre_nms_top_n_test`. 
   
   But even in the default setting, PyTorch NMS is fairly fast and it is not a bottleneck at all. So the root issue is our terrible GPU performance. We need to fix our NMS and there is no excuse to cheat in the frontend. 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [tvm] masahi commented on pull request #7137: [Torch] Fix PyTorch NMS conversion for negative scores

Posted by GitBox <gi...@apache.org>.

masahi commented on pull request #7137:
URL: https://github.com/apache/tvm/pull/7137#issuecomment-750694208


   @zhiics @kevinthesun @yongwww Great news!! I've just checked that with this fix, we can now get the same number of output boxes from pytorch and TVM. We can do assert_equal with high precision on bbox coordinates and scores. Class ids are identical. Tested on both LLVM and CUDA target. See the updated `test_object_detection.py`.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [tvm] zhiics commented on pull request #7137: [Torch] Fix PyTorch NMS conversion for negative scores

Posted by GitBox <gi...@apache.org>.

zhiics commented on pull request #7137:
URL: https://github.com/apache/tvm/pull/7137#issuecomment-750702471


   BTW, could you please run the Mask R-CNN benchmark to see if there is any big performance difference?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [tvm] masahi edited a comment on pull request #7137: [Torch] Fix PyTorch NMS conversion for negative scores

Posted by GitBox <gi...@apache.org>.

masahi edited a comment on pull request #7137:
URL: https://github.com/apache/tvm/pull/7137#issuecomment-750438126

My conclusion is that TVM's conversion rule for PyTorch NMS is definitely wrong and needs fixing. Here is my take away from the above discussion:

So please take a look at the above issue in torchvision and my comment carefully and let's go ahead and merge this.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [tvm] masahi commented on pull request #7137: [Torch] Fix PyTorch NMS conversion for negative scores

Posted by GitBox <gi...@apache.org>.

masahi commented on pull request #7137:
URL: https://github.com/apache/tvm/pull/7137#issuecomment-750438126

My conclusion is that TVM's conversion rule for PyTorch NMS is definitely wrong and needs fixing. Here is my take away from the above discussion:

So please take a look at the above issue in torchvision and my comment carefully and let's merge this ASAP.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [tvm] masahi edited a comment on pull request #7137: [Torch] Fix PyTorch NMS conversion for negative scores

Posted by GitBox <gi...@apache.org>.

masahi edited a comment on pull request #7137:
URL: https://github.com/apache/tvm/pull/7137#issuecomment-750694208


   @zhiics @kevinthesun @yongwww Great news!! I've just checked that with this fix, we can now get the same number of output boxes from pytorch and TVM, which enables meaningful comparison of two outputs. We can do `assert_equal` with high precision on bbox coordinates and scores. Class ids are identical. Tested on both LLVM and CUDA target. See the updated `test_object_detection.py`.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [tvm] masahi edited a comment on pull request #7137: [Torch] Fix PyTorch NMS conversion for negative scores

Posted by GitBox <gi...@apache.org>.

masahi edited a comment on pull request #7137:
URL: https://github.com/apache/tvm/pull/7137#issuecomment-751300057


   > One interesting thing about output result is: for pytorch 1.7, we can exact match the results of tvm vs pt with this change, but for pytorch 1.4 there is still mismatch which won't affect final accuracy
   
   Interesting, I didn't test on 1.4. If even the number of output box are different, then it must be a NMS issue, since NMS is the only thing that could change the number of box. I'm also fine if we can get the same output as the latest pytorch.
   
   Thanks for validating my fix. Luckily, I also found a way to parallelize the inner loop of GPU NMS which should give a massive speedup. The change is this one https://github.com/masahi/tvm/commit/c75c6ef00b1e89d6a6698aeb8048be6e6f13c0ff but since I'm away from my GPU at the moment, I haven't tested yet. It also reduces the number of IOU tests to O(N ** 2) to O (# selected boxes * N). 
   
   Hopefully next week I can send a PR with good perf improvement on GPU NMS and hence PyTorch MaskRCNN performance on GPU.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [tvm] yongwww commented on pull request #7137: [Torch] Fix PyTorch NMS conversion for negative scores

Posted by GitBox <gi...@apache.org>.

yongwww commented on pull request #7137:
URL: https://github.com/apache/tvm/pull/7137#issuecomment-750760998


   @masahi  thanks for the analysis and fix, the use of boxes with negative scores is interesting. 
   
   Previously, we used `get_valid_counts` before NMS to pre-process boxes, moving invalid boxes to the bottom of box tensor, which helps reduce the computation of NMS by skipping invalid boxes. If negative boxes are valid, then NMS will do computation for all boxes, which causes performance regression. Then we should consider improving the performance.
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [tvm] masahi commented on pull request #7137: [Torch] Fix PyTorch NMS conversion for negative scores

Posted by GitBox <gi...@apache.org>.

masahi commented on pull request #7137:
URL: https://github.com/apache/tvm/pull/7137#issuecomment-750769271


   Also I find that gluoncv MaskRCNN mostly follows the design of PyTorch MaskRCNN. But they apply sigmoid to objectness network outputs, so in their case there are no negative scores and all inputs to RPN NMS, which could be thousand of boxes, are valid. So we have to accept the fact that we need to deal with lots of boxes in MaskRCNN.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [tvm] masahi edited a comment on pull request #7137: [Torch] Fix PyTorch NMS conversion for negative scores

Posted by GitBox <gi...@apache.org>.

masahi edited a comment on pull request #7137:
URL: https://github.com/apache/tvm/pull/7137#issuecomment-750960526

ok I ran CPU perf comparison with old code vs new code + default param. As expected, there is almost no perf regression.

The new code gets far more boxes, but NMS spends only 20 milliseconds on it.
https://github.com/masahi/torchscript-to-tvm/blob/master/maskrcnn/maskrcnn_cpu_vm_profile_1000.txt#L24

Moreover, I'd like to reiterate that the new code is more correct and gives essentially the same output as PyTorch. At this point, if you are still not happy with the this fix, I'm genuinely curious why. I suggest looking into the details of the models you are compiling and doing benchmark carefully before making baseless claims.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [tvm] masahi commented on a change in pull request #7137: [Torch] Fix PyTorch NMS conversion for negative scores

Posted by GitBox <gi...@apache.org>.

masahi commented on a change in pull request #7137:
URL: https://github.com/apache/tvm/pull/7137#discussion_r548408319



##########
File path: tests/python/frontend/pytorch/test_object_detection.py
##########
@@ -70,7 +71,7 @@ def generate_jit_model(index):
     ]
 
     model_func = model_funcs[index]
-    model = TraceWrapper(model_func(pretrained=True))
+    model = TraceWrapper(model_func(pretrained=True, rpn_pre_nms_top_n_test=200))

Review comment:
       I think the default parameter 1000 they picked is fairly conservative. This means for each level in the feature pyramid, of which there is 5 if we use resnet 50 backbones, we get maximum of 1000 x 5 boxes as input to RPN. They have another parameter `rpn_post_nms_top_n_test`, which is like topk applied after NMS. This value is also by default 1000 and it is not per class unlike `rpn_pre_nms_top_n_test`. This means we always have 1000 boxes after NMS regardless of  `rpn_pre_nms_top_n_test`.
   




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [tvm] masahi edited a comment on pull request #7137: [Torch] Fix PyTorch NMS conversion for negative scores

Posted by GitBox <gi...@apache.org>.

masahi edited a comment on pull request #7137:
URL: https://github.com/apache/tvm/pull/7137#issuecomment-750438126

My conclusion is that TVM's conversion rule for PyTorch NMS is definitely wrong and needs fixing. Here is my take away from the above discussion:

So please take a look at the above issue in torchvision and my comment carefully and let's merge this ASAP.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [tvm] kevinthesun commented on a change in pull request #7137: [Torch] Fix PyTorch NMS conversion for negative scores

Posted by GitBox <gi...@apache.org>.

kevinthesun commented on a change in pull request #7137:
URL: https://github.com/apache/tvm/pull/7137#discussion_r546923370



##########
File path: python/tvm/relay/frontend/pytorch.py
##########
@@ -1857,16 +1857,18 @@ def nms(self, inputs, input_types):
         scores = inputs[1]
         iou_threshold = inputs[2]
 
+        num_boxes = _op.shape_of(scores)
+
+        # TVM NMS assumes score > 0
+        scores = scores - _op.min(scores) + _op.const(1.0)
         # Generate data with shape (1, num_anchors, 5)
         scores = AttrCvt(op_name="expand_dims", extras={"axis": -1, "num_newaxis": 1})([scores], {})
-
-        # Prepare input data for get_valid_counts
         data = _op.concatenate([scores, boxes], -1)
         data = _op.expand_dims(data, 0, 1)
-        # Leverage get_valid_counts to sort the data and clear invalid boxes
-        ct, data, indices = get_relay_op("get_valid_counts")(
-            data, score_threshold=-1.0, id_index=-1, score_index=0
-        )
+        # PyTorch NMS doesn't have score_threshold, so no need to run get_valid_count

Review comment:
       torchvision nms doesn't filter out invalid boxes before nms, which can be super slow. Filtering out negative score boxes should have no affect to results. Probably we can discuss more about what we can get from this change.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [tvm] kevinthesun commented on pull request #7137: [Torch] Fix PyTorch NMS conversion for negative scores

Posted by GitBox <gi...@apache.org>.

kevinthesun commented on pull request #7137:
URL: https://github.com/apache/tvm/pull/7137#issuecomment-751297202


   @masahi Thanks for this investigating and improvement. Indeed this change won't affect cpu perf much even when MKL is enabled and large dynamic shape dense becomes faster. One interesting thing about output result is: for pytorch 1.7, we can exact match the results of tvm vs pt with this change, but for pytorch 1.4 there is still mismatch which won't affect final accuracy. I'm fine with this change now.
   
   BTW, enabling MKL on my Intel Xeon Platinum machine with 18 cores can reduce the latency of pt maskrcnn from 1000 ms to 600 ms. Those large dynamic shape dense layers do contribute a lot to the latency.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [tvm] zhiics merged pull request #7137: [Torch] Fix PyTorch NMS conversion for negative scores

Posted by GitBox <gi...@apache.org>.

zhiics merged pull request #7137:
URL: https://github.com/apache/tvm/pull/7137


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [tvm] masahi commented on pull request #7137: [Torch] Fix PyTorch NMS conversion for negative scores

Posted by GitBox <gi...@apache.org>.

masahi commented on pull request #7137:
URL: https://github.com/apache/tvm/pull/7137#issuecomment-751300057


   > One interesting thing about output result is: for pytorch 1.7, we can exact match the results of tvm vs pt with this change, but for pytorch 1.4 there is still mismatch which won't affect final accuracy
   
   Interesting, I didn't test on 1.4. If even the number of output box are different, then it must be a NMS issue, since NMS is the only thing that could change the number of box. I'm also fine if we can get the same output as the latest pytorch.
   
   Thanks for validating my fix. Luckily, I also found a way to parallelize the inner loop of GPU NMS which should give a massive speedup. The change is this one 
   https://github.com/masahi/tvm/commit/c75c6ef00b1e89d6a6698aeb8048be6e6f13c0ff
   but since I'm away from my GPU at the moment, I haven't tested yet. It also reduces the number of IOU tests to O(N ** 2) to O (# selected boxes * N). 
   
   Hopefully next week I can send a PR with good perf improvement on GPU NMS and hence PyTorch MaskRCNN performance on GPU.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [tvm] masahi edited a comment on pull request #7137: [Torch] Fix PyTorch NMS conversion for negative scores

Posted by GitBox <gi...@apache.org>.

masahi edited a comment on pull request #7137:
URL: https://github.com/apache/tvm/pull/7137#issuecomment-750769271


   Also I find that gluoncv MaskRCNN mostly follows the design of PyTorch MaskRCNN. But they apply sigmoid to objectness network outputs, so in their case there are no negative scores and all inputs to RPN NMS, which could be thousand of boxes, are valid, even if "by valid" we mean the previous wrong definition of having positive score. 
   
   So we have to accept the fact that we need to deal with lots of boxes in MaskRCNN.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [tvm] zhiics commented on pull request #7137: [Torch] Fix PyTorch NMS conversion for negative scores

Posted by GitBox <gi...@apache.org>.

zhiics commented on pull request #7137:
URL: https://github.com/apache/tvm/pull/7137#issuecomment-750523461


   @masahi I see the problem now. Thanks for following up with the PyTorch community and fixing out the root cause. Ping @larrywyang @yongwww, please take a look since you worked on NMS more than others before. 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [tvm] masahi edited a comment on pull request #7137: [Torch] Fix PyTorch NMS conversion for negative scores

Posted by GitBox <gi...@apache.org>.

masahi edited a comment on pull request #7137:
URL: https://github.com/apache/tvm/pull/7137#issuecomment-750733915


   ok just as a one data point, when I was investigating GPU NMS performance issue, the old code was taking 630 milliseconds while with this fix it was 2.1 seconds. But again, that's because the new code is dealing with far more boxes and our GPU NMS code is currently extremely slow due to the sequential loop.
   
   According to the numbers I posted in https://github.com/apache/tvm/pull/7154, on CPU NMS is fast: the old code was spending only 8 milliseconds. So I don't expect NMS on CPU would be a big issue.
    
   After NMS, PyTorch detection model does post-NMS topk, which selects 1000 boxes for later processing. So the perf difference should only be in NMS. 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [tvm] zhiics commented on pull request #7137: [Torch] Fix PyTorch NMS conversion for negative scores

Posted by GitBox <gi...@apache.org>.

zhiics commented on pull request #7137:
URL: https://github.com/apache/tvm/pull/7137#issuecomment-750730655


   @masahi Thanks. I was just trying to see how different it would be.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [tvm] masahi edited a comment on pull request #7137: [Torch] Fix PyTorch NMS conversion for negative scores

Posted by GitBox <gi...@apache.org>.

masahi edited a comment on pull request #7137:
URL: https://github.com/apache/tvm/pull/7137#issuecomment-750438126

My conclusion is that TVM's conversion rule for PyTorch NMS is definitely wrong and needs fixing. Here is my take away from the above discussion:

The good news is, I found a way to reduce the number of boxes that are sent to NMS. See these parameters https://github.com/pytorch/vision/blob/master/torchvision/models/detection/mask_rcnn.py#L68-L71. They are something like `topk` parameter for each classes separately. The default is 1000, and there are five classes/levels in RPN NMS. So that explain why we are getting about 4500 boxes to RPN NMS. If we set the `rpn_post_nms_top_n_train` to 200, we will get at most 1000 boxes, 200 boxes for each level in the feature pyramid. That will significantly make detection model go faster and still consider a lot of boxes to keep accuracy high.

So please take a look at the above issue in torchvision and my comment carefully and let's go ahead and merge this.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org