You are viewing a plain text version of this content. The canonical link for it is here.

Posted to commits@mxnet.apache.org by GitBox <gi...@apache.org> on 2020/03/25 13:07:17 UTC

[GitHub] [incubator-mxnet] ElaineBao opened a new pull request #17906: Optimize AddTakeGrad Tensor Sum

ElaineBao opened a new pull request #17906: Optimize AddTakeGrad Tensor Sum
URL: https://github.com/apache/incubator-mxnet/pull/17906

## Description ##
The function of `AddTakeGrad` is used in the backward pass of embedding operator. Originally it uses tensor-level summation, which is very slow. By replacing tensor-level summation to element-wise summation, this function can be faster (about 6X speedup for a dummy example).

@xinyu-intel @zixuanweeei @TaoLv please review.

## Checklist ##
### Essentials ###
Please feel free to remove inapplicable items for your PR.
- [ ] The PR title starts with [MXNET-$JIRA_ID], where $JIRA_ID refers to the relevant [JIRA issue](https://issues.apache.org/jira/projects/MXNET/issues) created (except PRs with tiny changes)
- [ ] Changes are complete (i.e. I finished coding on this PR)
- [ ] All changes have test coverage:
- Unit tests are added for small changes to verify correctness (e.g. adding a new operator)
- Nightly tests are added for complicated/long-running ones (e.g. changing distributed kvstore)
- Build tests will be added for build configuration changes (e.g. adding a new build option with NCCL)
- [ ] Code is well-documented:
- For user-facing API changes, API doc string has been updated.
- For new C++ functions in header files, their functionalities and arguments are documented.
- For new examples, README.md is added to explain the what the example does, the source of the dataset, expected performance on test set and reference to the original paper if applicable
- Check the API doc at https://mxnet-ci-doc.s3-accelerate.dualstack.amazonaws.com/PR-$PR_ID/$BUILD_ID/index.html
- [ ] To the best of my knowledge, examples are either not affected by this change, or have been fixed to be compatible with this change

### Changes ###
- [ ] Feature1, tests, (and when applicable, API doc)
- [ ] Feature2, tests, (and when applicable, API doc)

## Comments ##
- If this change is a backward incompatible change, why must this change be made.
- Interesting edge cases to note here

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

With regards,
Apache Git Services

[GitHub] [incubator-mxnet] TaoLv commented on issue #17906: Optimize AddTakeGrad Tensor Sum

Posted by GitBox <gi...@apache.org>.

TaoLv commented on issue #17906: Optimize AddTakeGrad Tensor Sum
URL: https://github.com/apache/incubator-mxnet/pull/17906#issuecomment-609039636
 
 
   The CI issue should be already addressed. Please rebase your PR and resolve the comments. Thanks.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [incubator-mxnet] leezu commented on issue #17906: Optimize AddTakeGrad Tensor Sum

Posted by GitBox <gi...@apache.org>.

leezu commented on issue #17906: Optimize AddTakeGrad Tensor Sum
URL: https://github.com/apache/incubator-mxnet/pull/17906#issuecomment-609459052
 
 
   Opperf to be fixed by https://github.com/apache/incubator-mxnet/pull/17894

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [incubator-mxnet] TaoLv commented on issue #17906: Optimize AddTakeGrad Tensor Sum

Posted by GitBox <gi...@apache.org>.

TaoLv commented on issue #17906: Optimize AddTakeGrad Tensor Sum
URL: https://github.com/apache/incubator-mxnet/pull/17906#issuecomment-609429305
 
 
   @ElaineBao  Thank you. Impressive speedup!

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [incubator-mxnet] leezu edited a comment on issue #17906: Optimize AddTakeGrad Tensor Sum

Posted by GitBox <gi...@apache.org>.

leezu edited a comment on issue #17906: Optimize AddTakeGrad Tensor Sum
URL: https://github.com/apache/incubator-mxnet/pull/17906#issuecomment-604173004
 
 
   @ElaineBao do you expect this to be true at other places where tensor-level summation is used? Should these places be checked / fixed too?

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [incubator-mxnet] ElaineBao commented on issue #17906: Optimize AddTakeGrad Tensor Sum

Posted by GitBox <gi...@apache.org>.

ElaineBao commented on issue #17906: Optimize AddTakeGrad Tensor Sum
URL: https://github.com/apache/incubator-mxnet/pull/17906#issuecomment-609050759
 
 
   Sorry for the late reply.
   I tried to use opperf, but it doesn't work, some error throwed out when I was using it:
   ```
   #   File "/incubator-mxnet/benchmark/opperf/rules/default_params.py", line 606, in <module>
   #     "axis_shape": DEFAULT_AXIS_SHAPE,
   # NameError: name 'DEFAULT_AXIS_SHAPE' is not defined
   ```
   
   So I use mxnet profiler to validate the performance, I think it's also reasonable.
   The script is as follows:
   ```
   import random
   import pandas as pd
   import mxnet as mx
   import numpy as np
   from sklearn.model_selection import train_test_split
   
   batch_size = 1000
   num_epoch = 5
   model_prefix = 'drivethru_attention_d'
   n_plus= 522
   total = 40000
   profiling = True
   
   records = []
   for i in range(0, total):
       pluids = [random.randint(0, n_plus - 1) for i in range(0, 5)]
       label = random.randint(0, 1)
       records.append((pluids, label))
   
   data = pd.DataFrame(records,
                       columns=['pluids','label'])
   train, test = train_test_split(data, test_size=0.1, random_state=100)
   
   X_train = mx.io.NDArrayIter(data={'pluids': np.array(train['pluids'].values.tolist(), dtype=int)},
                               label={'output_label': train['label'].values},
                               batch_size=batch_size,
                               shuffle=True)
   X_eval = mx.io.NDArrayIter(data={'pluids': np.array(test['pluids'].values.tolist(), dtype=int)},
                               label={'output_label': test['label'].values},
                               batch_size=batch_size,
                               shuffle=True)
   y_true = mx.symbol.Variable('output_label')
   
   
   pluids = mx.symbol.Variable('pluids')
   plu_embed = mx.symbol.Embedding(data=pluids, input_dim=n_plus, output_dim=50, name='plu_embed')
   
   fc1 = mx.symbol.FullyConnected(data=plu_embed, num_hidden=int(n_plus), name='fc1')
   rec_model = mx.symbol.SoftmaxOutput(data=fc1, label=y_true, name='output')
   
   mod = mx.mod.Module(symbol=rec_model,
                       data_names=['pluids'],
                       label_names=['output_label'],
                       context=[mx.cpu()])
   # enable profiler
   mx.profiler.set_config(profile_symbolic=True, profile_imperative=True, profile_memory=False,
                                   profile_api=True, filename='profile.json', aggregate_stats=True)
   mx.profiler.set_state('run')
   
   mod.fit(train_data=X_train,
           num_epoch=num_epoch,
           initializer=mx.init.Xavier(rnd_type="gaussian"),
           optimizer='adagrad',
           eval_metric=['accuracy'],
           validation_metric=['accuracy', mx.metric.TopKAccuracy(3)],
           eval_data=X_eval,
           batch_end_callback=mx.callback.Speedometer(batch_size, 2))
   
   mx.profiler.set_state('stop')
   print(mx.profiler.dumps())
   ```
   
   And the performance:
   1. before optimization of _backward_Embedding:
   ```
   operator
   =================
   Name                          Total Count        Time (ms)    Min Time (ms)    Max Time (ms)    Avg Time (ms)
   ----                          -----------        ---------    -------------    -------------    -------------
   _backward_Embedding                   180        2854.3340          12.3320          29.1350          15.8574
   _mul_scalar                          1620         527.4130           0.0030           3.3110           0.3256
   _backward_FullyConnected              180         162.2140           0.7430           1.6510           0.9012
   SoftmaxOutput                         200         129.6620           0.1250           1.2650           0.6483
   FullyConnected                        200         110.0570           0.2340          42.0660           0.5503
   argmax                                200          49.5320           0.1840           0.4930           0.2477
   broadcast_add                        1080          31.0420           0.0040           2.9860           0.0287
   Embedding                             200          25.0530           0.0240           3.8110           0.1253
   _backward_SoftmaxOutput               180          19.0860           0.0560           0.8680           0.1060
   square                                540          18.5240           0.0030           2.8510           0.0343
   sqrt                                  540          17.3870           0.0060           0.9440           0.0322
   DeleteVariable                       3532          11.3070           0.0020           0.0330           0.0032
   broadcast_sub                         540           8.2790           0.0040           0.0440           0.0153
   broadcast_div                         540           7.4970           0.0050           0.0730           0.0139
   _plus_scalar                          555           6.5850           0.0040           0.0650           0.0119
   SetValueOp                              8           5.8160           0.0050           5.6540           0.7270
   CopyCPU2CPU                           448           4.1040           0.0020           0.1030           0.0092
   ResourceParallelRandomSetSeed               1           3.8440           3.8440           3.8440           3.8440
   WaitForVar                            220           1.3150           0.0040           0.0120           0.0060
   Cast                                   33           1.1590           0.0080           0.2680           0.0351
   _random_normal                          2           0.7270           0.1210           0.6060           0.3635
   _zeros                                  6           0.2650           0.0070           0.0720           0.0442
   _div_scalar                            15           0.1400           0.0050           0.0190           0.0093
   SetupExec                               6           0.0150           0.0010           0.0060           0.0025
   _full                                   1           0.0060           0.0060           0.0060           0.0060
   ```
   
   2. after optimization 
   ```
   operator
   =================
   Name                          Total Count        Time (ms)    Min Time (ms)    Max Time (ms)    Avg Time (ms)
   ----                          -----------        ---------    -------------    -------------    -------------
   _mul_scalar                          1620         451.0970           0.0030           3.2960           0.2785
   _backward_FullyConnected              180         195.9230           0.7440           2.3720           1.0885
   SoftmaxOutput                         200         156.3020           0.1080           1.2910           0.7815
   FullyConnected                        200         136.2320           0.2300          43.5920           0.6812
   argmax                                200          54.5550           0.1710           0.4960           0.2728
   _backward_SoftmaxOutput               180          39.8900           0.0570           0.8930           0.2216
   Embedding                             200          27.0910           0.0270           3.1330           0.1355
   broadcast_add                        1080          24.6370           0.0040           0.6560           0.0228
   _backward_Embedding                   180          21.5230           0.0970           0.4120           0.1196
   sqrt                                  540          20.1840           0.0060           0.1300           0.0374
   square                                540          19.2420           0.0040           2.9200           0.0356
   DeleteVariable                       3532          13.1160           0.0010           0.1310           0.0037
   broadcast_sub                         540          11.0550           0.0040           0.0980           0.0205
   broadcast_div                         540           9.3750           0.0050           0.1110           0.0174
   _plus_scalar                          555           8.2280           0.0040           0.1140           0.0148
   SetValueOp                              8           5.9090           0.0050           5.7620           0.7386
   CopyCPU2CPU                           448           4.2760           0.0030           0.1040           0.0095
   ResourceParallelRandomSetSeed               1           3.8370           3.8370           3.8370           3.8370
   Cast                                   33           1.2800           0.0090           0.2670           0.0388
   WaitForVar                            195           1.2160           0.0040           0.0180           0.0062
   _random_normal                          2           0.7190           0.1200           0.5990           0.3595
   _zeros                                  6           0.2710           0.0060           0.0790           0.0452
   _div_scalar                            15           0.2610           0.0050           0.0790           0.0174
   SetupExec                               6           0.0150           0.0020           0.0050           0.0025
   _full                                   1           0.0070           0.0070           0.0070           0.0070
   ```

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [incubator-mxnet] mxnet-bot commented on issue #17906: Optimize AddTakeGrad Tensor Sum

Posted by GitBox <gi...@apache.org>.

mxnet-bot commented on issue #17906: Optimize AddTakeGrad Tensor Sum
URL: https://github.com/apache/incubator-mxnet/pull/17906#issuecomment-604169969
 
 
   Jenkins CI successfully triggered : [windows-gpu, unix-gpu]

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [incubator-mxnet] leezu commented on issue #17906: Optimize AddTakeGrad Tensor Sum

Posted by GitBox <gi...@apache.org>.

leezu commented on issue #17906: Optimize AddTakeGrad Tensor Sum
URL: https://github.com/apache/incubator-mxnet/pull/17906#issuecomment-604173004
 
 
   @ElaineBao do you expect this to be true at other places where tensor-level summation is used? If so, how about removing the tensor-level summation feature from mshadow and fix all places in the mxnet codebase?

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [incubator-mxnet] mxnet-bot commented on issue #17906: Optimize AddTakeGrad Tensor Sum

Posted by GitBox <gi...@apache.org>.

mxnet-bot commented on issue #17906: Optimize AddTakeGrad Tensor Sum
URL: https://github.com/apache/incubator-mxnet/pull/17906#issuecomment-603828047
 
 
   Hey @ElaineBao , Thanks for submitting the PR 
   Once your PR is ready for CI checks, invoke the following commands: 
   - To trigger all jobs: @mxnet-bot run ci [all] 
   - To trigger specific jobs: @mxnet-bot run ci [job1, job2] 
   *** 
   **CI supported jobs**: [unix-cpu, edge, centos-gpu, windows-cpu, miscellaneous, sanity, unix-gpu, windows-gpu, clang, website, centos-cpu]
   *** 
   _Note_: 
    Only following 3 categories can trigger CI :PR Author, MXNet Committer, Jenkins Admin. 
   All CI tests must pass before the PR can be merged. 
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [incubator-mxnet] ElaineBao commented on issue #17906: Optimize AddTakeGrad Tensor Sum

Posted by GitBox <gi...@apache.org>.

ElaineBao commented on issue #17906: Optimize AddTakeGrad Tensor Sum
URL: https://github.com/apache/incubator-mxnet/pull/17906#issuecomment-604176384
 
 
   @leezu I cannot say all tensor-level summation is slow, after all I haven't run all the cases. 
   
   But changing tensor-level summation to element-wise summation actually increases the amount of code and makes the code less readable, so if not for known efficiency issue, I think it's better to remain unchanged and using tensor-level summation.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [incubator-mxnet] ElaineBao commented on issue #17906: Optimize AddTakeGrad Tensor Sum

Posted by GitBox <gi...@apache.org>.

ElaineBao commented on issue #17906: Optimize AddTakeGrad Tensor Sum
URL: https://github.com/apache/incubator-mxnet/pull/17906#issuecomment-604169748
 
 
   > Is tensor-level summation slow as the compiler fails to optimize the mshadow implementation? Or what is the reason?
   
   Not quite sure about the reason, but I think it may be related to temporary memory allocation

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [incubator-mxnet] ElaineBao commented on issue #17906: Optimize AddTakeGrad Tensor Sum

Posted by GitBox <gi...@apache.org>.

ElaineBao commented on issue #17906: Optimize AddTakeGrad Tensor Sum
URL: https://github.com/apache/incubator-mxnet/pull/17906#issuecomment-604189163
 
 
   @TaoLv OK, I'll work on it

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [incubator-mxnet] ElaineBao commented on issue #17906: Optimize AddTakeGrad Tensor Sum

Posted by GitBox <gi...@apache.org>.

ElaineBao commented on issue #17906: Optimize AddTakeGrad Tensor Sum
URL: https://github.com/apache/incubator-mxnet/pull/17906#issuecomment-609135780
 
 
   @mxnet-bot run ci [unix-gpu, unix-cpu]

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [incubator-mxnet] leezu merged pull request #17906: Optimize AddTakeGrad Tensor Sum

Posted by GitBox <gi...@apache.org>.

leezu merged pull request #17906: Optimize AddTakeGrad Tensor Sum
URL: https://github.com/apache/incubator-mxnet/pull/17906
 
 
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [incubator-mxnet] mxnet-bot commented on issue #17906: Optimize AddTakeGrad Tensor Sum

Posted by GitBox <gi...@apache.org>.

mxnet-bot commented on issue #17906: Optimize AddTakeGrad Tensor Sum
URL: https://github.com/apache/incubator-mxnet/pull/17906#issuecomment-609135956
 
 
   Jenkins CI successfully triggered : [unix-gpu, unix-cpu]

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [incubator-mxnet] ElaineBao commented on issue #17906: Optimize AddTakeGrad Tensor Sum

Posted by GitBox <gi...@apache.org>.

ElaineBao commented on issue #17906: Optimize AddTakeGrad Tensor Sum
URL: https://github.com/apache/incubator-mxnet/pull/17906#issuecomment-604169929
 
 
   @mxnet-bot run ci [unix-gpu, windows-gpu]

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [incubator-mxnet] ElaineBao commented on issue #17906: Optimize AddTakeGrad Tensor Sum

Posted by GitBox <gi...@apache.org>.

ElaineBao commented on issue #17906: Optimize AddTakeGrad Tensor Sum
URL: https://github.com/apache/incubator-mxnet/pull/17906#issuecomment-603828312
 
 
   @mxnet-bot run ci [all]

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [incubator-mxnet] TaoLv commented on issue #17906: Optimize AddTakeGrad Tensor Sum

Posted by GitBox <gi...@apache.org>.

TaoLv commented on issue #17906: Optimize AddTakeGrad Tensor Sum
URL: https://github.com/apache/incubator-mxnet/pull/17906#issuecomment-604180544
 
 
   @ElaineBao Could you please share a benchmarking script so we can verify the effect of this optimization? opperf may help: https://github.com/apache/incubator-mxnet/tree/master/benchmark/opperf.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [incubator-mxnet] TaoLv commented on issue #17906: Optimize AddTakeGrad Tensor Sum

Posted by GitBox <gi...@apache.org>.

TaoLv commented on issue #17906: Optimize AddTakeGrad Tensor Sum
URL: https://github.com/apache/incubator-mxnet/pull/17906#issuecomment-609430145
 
 
   > Sorry for the late reply.
   I tried to use opperf, but it doesn't work, some error throwed out when I was using it:
   File "/incubator-mxnet/benchmark/opperf/rules/default_params.py", line 606, in <module>
      "axis_shape": DEFAULT_AXIS_SHAPE,
   NameError: name 'DEFAULT_AXIS_SHAPE' is not defined
   
   FYI, @ChaiBapchya.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [incubator-mxnet] leezu commented on issue #17906: Optimize AddTakeGrad Tensor Sum

Posted by GitBox <gi...@apache.org>.

leezu commented on issue #17906: Optimize AddTakeGrad Tensor Sum
URL: https://github.com/apache/incubator-mxnet/pull/17906#issuecomment-603968342
 
 
   Is tensor-level summation slow as the compiler fails to optimize the mshadow implementation? Or what is the reason?

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services