You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@mxnet.apache.org by GitBox <gi...@apache.org> on 2019/05/11 04:55:39 UTC
[GitHub] [incubator-mxnet] ZhennanQin opened a new pull request #14931: Re-enable static cached_op optimization

ZhennanQin opened a new pull request #14931: Re-enable static cached_op optimization
URL: https://github.com/apache/incubator-mxnet/pull/14931
 
 
   ## Description ##
   This PR firstly filed at https://github.com/apache/incubator-mxnet/pull/14785, but reverted because of  https://github.com/dmlc/gluon-nlp/issues/690. Now it's fixed.
   
   The fixed output is,
   ```
   $ python run_pretraining.py --gpus 0 --batch_size 32 --lr 2e-5 --data 'out/*.npz' --warmup_ratio 0.5 --num_steps 20 --pretrained --log_interval=2 --data_eval 'out/*.npz' --batch_size_eval 8 --ckpt_dir ckpt
   INFO:root:Namespace(accumulate=1, batch_size=32, batch_size_eval=8, ckpt_dir='ckpt', ckpt_interval=250000, data='out/*.npz', data_eval='out/*.npz', dataset_name='book_corpus_wiki_en_uncased', dtype='float32', dummy_data_len=None, gpus='0', kvstore='device', log_interval=2, lr=2e-05, model='bert_12_768_12', num_buckets=1, num_steps=20, pretrained=True, profile=None, seed=0, start_step=0, use_avg_len=False, verbose=False, warmup_ratio=0.5)
   [12:41:47] src/storage/storage.cc:108: Using GPUPooledRoundedStorageManager.
   INFO:root:Using training data at out/*.npz
   INFO:root:[step 1]      mlm_loss=1.57931        mlm_acc=46.99793        nsp_loss=0.32225        nsp_acc=84.375  throughput=2.4K tks/s   lr=0.0000020 time=1.34, latency=668.3 ms/batch
   INFO:root:[step 3]      mlm_loss=3.27771        mlm_acc=48.94737        nsp_loss=0.47513        nsp_acc=88.333  throughput=6.8K tks/s   lr=0.0000060 time=0.93, latency=466.5 ms/batch
   INFO:root:[step 5]      mlm_loss=2.79173        mlm_acc=50.60241        nsp_loss=0.12542        nsp_acc=92.857  throughput=6.2K tks/s   lr=0.0000100 time=0.91, latency=453.5 ms/batch
   INFO:root:[step 7]      mlm_loss=2.81245        mlm_acc=52.21421        nsp_loss=0.25357        nsp_acc=92.188  throughput=6.5K tks/s   lr=0.0000140 time=1.00, latency=499.9 ms/batch
   INFO:root:[step 9]      mlm_loss=2.09288        mlm_acc=62.17143        nsp_loss=0.01765        nsp_acc=100.000 throughput=6.3K tks/s   lr=0.0000180 time=0.93, latency=464.5 ms/batch
   INFO:root:[step 11]     mlm_loss=1.53132        mlm_acc=71.63121        nsp_loss=0.00595        nsp_acc=100.000 throughput=6.8K tks/s   lr=0.0000090 time=0.98, latency=490.6 ms/batch
   INFO:root:[step 13]     mlm_loss=1.04932        mlm_acc=79.85866        nsp_loss=0.00190        nsp_acc=100.000 throughput=5.9K tks/s   lr=0.0000070 time=0.97, latency=484.6 ms/batch
   INFO:root:[step 15]     mlm_loss=0.93987        mlm_acc=81.75027        nsp_loss=0.00284        nsp_acc=100.000 throughput=6.3K tks/s   lr=0.0000050 time=1.00, latency=500.4 ms/batch
   INFO:root:[step 17]     mlm_loss=0.68529        mlm_acc=86.50794        nsp_loss=0.00250        nsp_acc=100.000 throughput=6.6K tks/s   lr=0.0000030 time=1.03, latency=516.9 ms/batch
   INFO:root:[step 19]     mlm_loss=0.50951        mlm_acc=89.44900        nsp_loss=0.00218        nsp_acc=100.000 throughput=6.2K tks/s   lr=0.0000010 time=0.92, latency=460.7 ms/batch
   INFO:root:[step 20] Saving checkpoints to ckpt/0000020.params, ckpt/0000020.states.
   INFO:root:Train cost=26.3s
   INFO:root:Using evaluation data at out/*.npz
   INFO:root:[step 1]      mlm_loss=0.20752        mlm_acc=93.26923        nsp_loss=0.00008        nsp_acc=100.000 throughput=3.0K tks/s   lr=0.0000000 time=0.23, latency=115.1 ms/batch
   INFO:root:[step 3]      mlm_loss=0.43117        mlm_acc=91.76955        nsp_loss=0.00139        nsp_acc=100.000 throughput=10.5K tks/s  lr=0.0000000 time=0.15, latency=77.5 ms/batch
   INFO:root:[step 5]      mlm_loss=0.39373        mlm_acc=92.27941        nsp_loss=0.00016        nsp_acc=100.000 throughput=12.0K tks/s  lr=0.0000000 time=0.15, latency=76.4 ms/batch
   INFO:root:[step 7]      mlm_loss=0.37862        mlm_acc=91.18943        nsp_loss=0.00093        nsp_acc=100.000 throughput=10.4K tks/s  lr=0.0000000 time=0.15, latency=73.4 ms/batch
   INFO:root:mlm_loss=0.406        mlm_acc=92.0    nsp_loss=0.001  nsp_acc=100.0
   INFO:root:Eval cost=0.7s
   ```
   
   @pengzhao-intel @eric-haibin-lin 
   
   ## Checklist ##
   ### Essentials ###
   Please feel free to remove inapplicable items for your PR.
   - [ ] The PR title starts with [MXNET-$JIRA_ID], where $JIRA_ID refers to the relevant [JIRA issue](https://issues.apache.org/jira/projects/MXNET/issues) created (except PRs with tiny changes)
   - [ ] Changes are complete (i.e. I finished coding on this PR)
   - [ ] All changes have test coverage:
   - Unit tests are added for small changes to verify correctness (e.g. adding a new operator)
   - Nightly tests are added for complicated/long-running ones (e.g. changing distributed kvstore)
   - Build tests will be added for build configuration changes (e.g. adding a new build option with NCCL)
   - [ ] Code is well-documented: 
   - For user-facing API changes, API doc string has been updated. 
   - For new C++ functions in header files, their functionalities and arguments are documented. 
   - For new examples, README.md is added to explain the what the example does, the source of the dataset, expected performance on test set and reference to the original paper if applicable
   - Check the API doc at http://mxnet-ci-doc.s3-accelerate.dualstack.amazonaws.com/PR-$PR_ID/$BUILD_ID/index.html
   - [ ] To the my best knowledge, examples are either not affected by this change, or have been fixed to be compatible with this change
   
   ### Changes ###
   - [ ] Feature1, tests, (and when applicable, API doc)
   - [ ] Feature2, tests, (and when applicable, API doc)
   
   ## Comments ##
   - If this change is a backward incompatible change, why must this change be made.
   - Interesting edge cases to note here
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services