You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@mxnet.apache.org by GitBox <gi...@apache.org> on 2022/02/14 14:16:36 UTC

[GitHub] [incubator-mxnet] bgawrych opened a new pull request #20894: Reduce after quantization memory usage

bgawrych opened a new pull request #20894:
URL: https://github.com/apache/incubator-mxnet/pull/20894


   ## Description ##
   This change prevents MXNet from allocating additional memory space for gradients in quantized model as it can't be used anyway.
   
   Memory measurement script:
   ```
   import mxnet as mx
   from mxnet.gluon.model_zoo import vision
   import psutil
   import os
   
   def get_process_memory():
       process = psutil.Process(os.getpid())
       mem_info = process.memory_info()
       return mem_info.rss * 1e-6
   
   
   batch_shape = (1, 3, 224, 224)
   data = mx.np.random.normal(size=batch_shape)
   
   print("memory before loading model: ", get_process_memory())
   net = vision.resnet50_v1(pretrained=True)
   print("memory after loading model: ", get_process_memory())
   out = net(data)
   out.wait_to_read()
   print("memory after fp32 forward pass", get_process_memory())
   
   dataset = mx.gluon.data.ArrayDataset(data)
   data_loader = mx.gluon.data.DataLoader(dataset, batch_size=1)
   net_quantized = mx.contrib.quant.quantize_net(net, quantized_dtype='int8',
                                                   quantize_mode="smart",
                                                   calib_mode='naive',
                                                   calib_data=data_loader,
                                                   num_calib_batches=1,
                                                   ctx=mx.current_context())
   
   print("memory after quantization: ", get_process_memory())
   
   outputs = net_quantized(data)
   outputs.wait_to_read()
   print("memory after int8 forward pass: ", get_process_memory())
   ```
   **Output before:**
   ```
   memory before loading model:  213.430272
   [15:14:11] ../src/storage/storage.cc:202: Using Pooled (Naive) StorageManager for CPU
   memory after loading model:  530.702336
   memory after fp32 forward pass 611.241984
   /home/bg/work/MXNet/python/mxnet/gluon/block.py:1918: UserWarning: Cannot decide type for the following arguments. Consider providing them as input:
           data: None
     input_sym_arg_type = in_param.infer_type()[0]
   /home/bg/work/MXNet/python/mxnet/gluon/block.py:1251: UserWarning: register_op_hook is experimental when static_alloc=True / static_shape=True  and may not work correctly
     warnings.warn("register_op_hook is experimental when static_alloc=True / static_shape=True "
   memory after quantization:  1064.57088
   memory after int8 forward pass:  1071.005696
   ```
   
   **Output after:**
   ```
   memory before loading model:  214.28633599999998
   [15:13:17] ../src/storage/storage.cc:202: Using Pooled (Naive) StorageManager for CPU
   memory after loading model:  531.2593919999999
   memory after fp32 forward pass 609.513472
   /home/bg/work/MXNet/python/mxnet/gluon/block.py:1918: UserWarning: Cannot decide type for the following arguments. Consider providing them as input:
           data: None
     input_sym_arg_type = in_param.infer_type()[0]
   /home/bg/work/MXNet/python/mxnet/gluon/block.py:1251: UserWarning: register_op_hook is experimental when static_alloc=True / static_shape=True  and may not work correctly
     warnings.warn("register_op_hook is experimental when static_alloc=True / static_shape=True "
   memory after quantization:  890.273792
   memory after int8 forward pass:  895.2258559999999
   ```
   
   Significant memory usage reduction can be observed


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@mxnet.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-mxnet] mxnet-bot commented on pull request #20894: Reduce after quantization memory usage

Posted by GitBox <gi...@apache.org>.
mxnet-bot commented on pull request #20894:
URL: https://github.com/apache/incubator-mxnet/pull/20894#issuecomment-1039137401


   Hey @bgawrych , Thanks for submitting the PR 
   All tests are already queued to run once. If tests fail, you can trigger one or more tests again with the following commands: 
   - To trigger all jobs: @mxnet-bot run ci [all] 
   - To trigger specific jobs: @mxnet-bot run ci [job1, job2] 
   *** 
   **CI supported jobs**: [edge, miscellaneous, windows-gpu, centos-cpu, unix-cpu, unix-gpu, windows-cpu, website, sanity, clang, centos-gpu]
   *** 
   _Note_: 
    Only following 3 categories can trigger CI :PR Author, MXNet Committer, Jenkins Admin. 
   All CI tests must pass before the PR can be merged. 
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@mxnet.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-mxnet] bgawrych merged pull request #20894: Reduce after quantization memory usage

Posted by GitBox <gi...@apache.org>.
bgawrych merged pull request #20894:
URL: https://github.com/apache/incubator-mxnet/pull/20894


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@mxnet.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org