You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@mxnet.apache.org by GitBox <gi...@apache.org> on 2020/03/03 13:35:41 UTC
[GitHub] [incubator-mxnet] kpuatamazon commented on issue #17559: [MXNET-1446] Quantization: intgemm matrix multiply wrappers

kpuatamazon commented on issue #17559: [MXNET-1446] Quantization: intgemm matrix multiply wrappers 
URL: https://github.com/apache/incubator-mxnet/pull/17559#issuecomment-593952925
 
 
   The quantization operator is now parallelized with OpenMP and supports an arbitrary number of arguments. It is substantially faster than the current MXNet implementation on both 1 and 24 cores (see below for benchmarks) @pengzhao-intel .  
   
   Maybe this is too big of a pull request.  Would you be happy with a smaller pull request that takes the faster quantization code and replaces the implementation of the existing quantize and quantize_v2 operators, so it also appears in the quantization flow?  
   
   Then we can carry on with matrix multiply next.  
   
   @ciyongch I'm calling operators manually because we're using gluon and the quantization workflow doesn't work for us anyway.  But if you're game to have operators optimized, they'll automatically be in the workflow too.  
   
   ```
   OMP_NUM_THREADS=24 ./quant_bench.py
   Shape (1, 1)
   0.0001304 seconds for quantize
   0.0001076 seconds for quantize_v2
   0.0000310 seconds for intgemm
   0.0001114 seconds for quantize_v2_fit
   0.0000479 seconds for intgemm_fit
   intgemm is 3.5x faster with calibration
   intgemm is 2.3x faster without calibration
   Shape (128, 128)
   0.0001649 seconds for quantize
   0.0001399 seconds for quantize_v2
   0.0000329 seconds for intgemm
   0.0001533 seconds for quantize_v2_fit
   0.0000502 seconds for intgemm_fit
   intgemm is 4.2x faster with calibration
   intgemm is 3.1x faster without calibration
   Shape (256, 256)
   0.0001660 seconds for quantize
   0.0001404 seconds for quantize_v2
   0.0000335 seconds for intgemm
   0.0001599 seconds for quantize_v2_fit
   0.0000505 seconds for intgemm_fit
   intgemm is 4.2x faster with calibration
   intgemm is 3.2x faster without calibration
   Shape (512, 512)
   0.0001691 seconds for quantize
   0.0001434 seconds for quantize_v2
   0.0000342 seconds for intgemm
   0.0001813 seconds for quantize_v2_fit
   0.0000540 seconds for intgemm_fit
   intgemm is 4.2x faster with calibration
   intgemm is 3.4x faster without calibration
   Shape (1024, 1024)
   0.0001920 seconds for quantize
   0.0001538 seconds for quantize_v2
   0.0000511 seconds for intgemm
   0.0002390 seconds for quantize_v2_fit
   0.0000827 seconds for intgemm_fit
   intgemm is 3.0x faster with calibration
   intgemm is 2.9x faster without calibration
   Shape (2048, 2048)
   0.0002364 seconds for quantize
   0.0001989 seconds for quantize_v2
   0.0000875 seconds for intgemm
   0.0004747 seconds for quantize_v2_fit
   0.0001531 seconds for intgemm_fit
   intgemm is 2.3x faster with calibration
   intgemm is 3.1x faster without calibration
   Shape (20971520,)
   0.0011446 seconds for quantize
   0.0010902 seconds for quantize_v2
   0.0008950 seconds for intgemm
   0.0023337 seconds for quantize_v2_fit
   0.0015005 seconds for intgemm_fit
   intgemm is 1.2x faster with calibration
   intgemm is 1.6x faster without calibration
   Shape (8, 4096)
   0.0001636 seconds for quantize
   0.0001392 seconds for quantize_v2
   0.0000364 seconds for intgemm
   0.0001508 seconds for quantize_v2_fit
   0.0000651 seconds for intgemm_fit
   intgemm is 3.8x faster with calibration
   intgemm is 2.3x faster without calibration
   Shape (4096, 8)
   0.0001642 seconds for quantize
   0.0001392 seconds for quantize_v2
   0.0000370 seconds for intgemm
   0.0001515 seconds for quantize_v2_fit
   0.0000654 seconds for intgemm_fit
   intgemm is 3.8x faster with calibration
   intgemm is 2.3x faster without calibration
   ```
   ```
   OMP_NUM_THREADS=1 ./quant_bench.py
   Shape (1, 1)
   0.0000630 seconds for quantize
   0.0000706 seconds for quantize_v2
   0.0000294 seconds for intgemm
   0.0000632 seconds for quantize_v2_fit
   0.0000475 seconds for intgemm_fit
   intgemm is 2.1x faster with calibration
   intgemm is 1.3x faster without calibration
   Shape (128, 128)
   0.0000860 seconds for quantize
   0.0000898 seconds for quantize_v2
   0.0000324 seconds for intgemm
   0.0000996 seconds for quantize_v2_fit
   0.0000464 seconds for intgemm_fit
   intgemm is 2.6x faster with calibration
   intgemm is 2.1x faster without calibration
   Shape (256, 256)
   0.0000976 seconds for quantize
   0.0001028 seconds for quantize_v2
   0.0000339 seconds for intgemm
   0.0001513 seconds for quantize_v2_fit
   0.0000521 seconds for intgemm_fit
   intgemm is 2.9x faster with calibration
   intgemm is 2.9x faster without calibration
   Shape (512, 512)
   0.0001724 seconds for quantize
   0.0001693 seconds for quantize_v2
   0.0000839 seconds for intgemm
   0.0004351 seconds for quantize_v2_fit
   0.0001420 seconds for intgemm_fit
   intgemm is 2.0x faster with calibration
   intgemm is 3.1x faster without calibration
   Shape (1024, 1024)
   0.0003559 seconds for quantize
   0.0003481 seconds for quantize_v2
   0.0002384 seconds for intgemm
   0.0013803 seconds for quantize_v2_fit
   0.0004667 seconds for intgemm_fit
   intgemm is 1.5x faster with calibration
   intgemm is 3.0x faster without calibration
   Shape (2048, 2048)
   0.0011425 seconds for quantize
   0.0010880 seconds for quantize_v2
   0.0008497 seconds for intgemm
   0.0051828 seconds for quantize_v2_fit
   0.0018427 seconds for intgemm_fit
   intgemm is 1.3x faster with calibration
   intgemm is 2.8x faster without calibration
   Shape (20971520,)
   0.0101917 seconds for quantize
   0.0096956 seconds for quantize_v2
   0.0071391 seconds for intgemm
   0.0305159 seconds for quantize_v2_fit
   0.0140535 seconds for intgemm_fit
   intgemm is 1.4x faster with calibration
   intgemm is 2.2x faster without calibration
   Shape (8, 4096)
   0.0000880 seconds for quantize
   0.0000950 seconds for quantize_v2
   0.0000334 seconds for intgemm
   0.0001183 seconds for quantize_v2_fit
   0.0000423 seconds for intgemm_fit
   intgemm is 2.6x faster with calibration
   intgemm is 2.8x faster without calibration
   Shape (4096, 8)
   0.0000900 seconds for quantize
   0.0000949 seconds for quantize_v2
   0.0000332 seconds for intgemm
   0.0001215 seconds for quantize_v2_fit
   0.0000433 seconds for intgemm_fit
   intgemm is 2.7x faster with calibration
   intgemm is 2.8x faster without calibration
   ```

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services