You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@mxnet.apache.org by GitBox <gi...@apache.org> on 2020/03/25 13:07:17 UTC
[GitHub] [incubator-mxnet] ElaineBao opened a new pull request #17906:
Optimize AddTakeGrad Tensor Sum
ElaineBao opened a new pull request #17906: Optimize AddTakeGrad Tensor Sum
URL: https://github.com/apache/incubator-mxnet/pull/17906
## Description ##
The function of `AddTakeGrad` is used in the backward pass of embedding operator. Originally it uses tensor-level summation, which is very slow. By replacing tensor-level summation to element-wise summation, this function can be faster (about 6X speedup for a dummy example).
@xinyu-intel @zixuanweeei @TaoLv please review.
## Checklist ##
### Essentials ###
Please feel free to remove inapplicable items for your PR.
- [ ] The PR title starts with [MXNET-$JIRA_ID], where $JIRA_ID refers to the relevant [JIRA issue](https://issues.apache.org/jira/projects/MXNET/issues) created (except PRs with tiny changes)
- [ ] Changes are complete (i.e. I finished coding on this PR)
- [ ] All changes have test coverage:
- Unit tests are added for small changes to verify correctness (e.g. adding a new operator)
- Nightly tests are added for complicated/long-running ones (e.g. changing distributed kvstore)
- Build tests will be added for build configuration changes (e.g. adding a new build option with NCCL)
- [ ] Code is well-documented:
- For user-facing API changes, API doc string has been updated.
- For new C++ functions in header files, their functionalities and arguments are documented.
- For new examples, README.md is added to explain the what the example does, the source of the dataset, expected performance on test set and reference to the original paper if applicable
- Check the API doc at https://mxnet-ci-doc.s3-accelerate.dualstack.amazonaws.com/PR-$PR_ID/$BUILD_ID/index.html
- [ ] To the best of my knowledge, examples are either not affected by this change, or have been fixed to be compatible with this change
### Changes ###
- [ ] Feature1, tests, (and when applicable, API doc)
- [ ] Feature2, tests, (and when applicable, API doc)
## Comments ##
- If this change is a backward incompatible change, why must this change be made.
- Interesting edge cases to note here
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
With regards,
Apache Git Services
[GitHub] [incubator-mxnet] TaoLv commented on issue #17906: Optimize
AddTakeGrad Tensor Sum
Posted by GitBox <gi...@apache.org>.
TaoLv commented on issue #17906: Optimize AddTakeGrad Tensor Sum
URL: https://github.com/apache/incubator-mxnet/pull/17906#issuecomment-609039636
The CI issue should be already addressed. Please rebase your PR and resolve the comments. Thanks.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
With regards,
Apache Git Services
[GitHub] [incubator-mxnet] leezu commented on issue #17906: Optimize
AddTakeGrad Tensor Sum
Posted by GitBox <gi...@apache.org>.
leezu commented on issue #17906: Optimize AddTakeGrad Tensor Sum
URL: https://github.com/apache/incubator-mxnet/pull/17906#issuecomment-609459052
Opperf to be fixed by https://github.com/apache/incubator-mxnet/pull/17894
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
With regards,
Apache Git Services
[GitHub] [incubator-mxnet] TaoLv commented on issue #17906: Optimize
AddTakeGrad Tensor Sum
Posted by GitBox <gi...@apache.org>.
TaoLv commented on issue #17906: Optimize AddTakeGrad Tensor Sum
URL: https://github.com/apache/incubator-mxnet/pull/17906#issuecomment-609429305
@ElaineBao Thank you. Impressive speedup!
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
With regards,
Apache Git Services
[GitHub] [incubator-mxnet] leezu edited a comment on issue #17906: Optimize
AddTakeGrad Tensor Sum
Posted by GitBox <gi...@apache.org>.
leezu edited a comment on issue #17906: Optimize AddTakeGrad Tensor Sum
URL: https://github.com/apache/incubator-mxnet/pull/17906#issuecomment-604173004
@ElaineBao do you expect this to be true at other places where tensor-level summation is used? Should these places be checked / fixed too?
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
With regards,
Apache Git Services
[GitHub] [incubator-mxnet] ElaineBao commented on issue #17906: Optimize
AddTakeGrad Tensor Sum
Posted by GitBox <gi...@apache.org>.
ElaineBao commented on issue #17906: Optimize AddTakeGrad Tensor Sum
URL: https://github.com/apache/incubator-mxnet/pull/17906#issuecomment-609050759
Sorry for the late reply.
I tried to use opperf, but it doesn't work, some error throwed out when I was using it:
```
# File "/incubator-mxnet/benchmark/opperf/rules/default_params.py", line 606, in <module>
# "axis_shape": DEFAULT_AXIS_SHAPE,
# NameError: name 'DEFAULT_AXIS_SHAPE' is not defined
```
So I use mxnet profiler to validate the performance, I think it's also reasonable.
The script is as follows:
```
import random
import pandas as pd
import mxnet as mx
import numpy as np
from sklearn.model_selection import train_test_split
batch_size = 1000
num_epoch = 5
model_prefix = 'drivethru_attention_d'
n_plus= 522
total = 40000
profiling = True
records = []
for i in range(0, total):
pluids = [random.randint(0, n_plus - 1) for i in range(0, 5)]
label = random.randint(0, 1)
records.append((pluids, label))
data = pd.DataFrame(records,
columns=['pluids','label'])
train, test = train_test_split(data, test_size=0.1, random_state=100)
X_train = mx.io.NDArrayIter(data={'pluids': np.array(train['pluids'].values.tolist(), dtype=int)},
label={'output_label': train['label'].values},
batch_size=batch_size,
shuffle=True)
X_eval = mx.io.NDArrayIter(data={'pluids': np.array(test['pluids'].values.tolist(), dtype=int)},
label={'output_label': test['label'].values},
batch_size=batch_size,
shuffle=True)
y_true = mx.symbol.Variable('output_label')
pluids = mx.symbol.Variable('pluids')
plu_embed = mx.symbol.Embedding(data=pluids, input_dim=n_plus, output_dim=50, name='plu_embed')
fc1 = mx.symbol.FullyConnected(data=plu_embed, num_hidden=int(n_plus), name='fc1')
rec_model = mx.symbol.SoftmaxOutput(data=fc1, label=y_true, name='output')
mod = mx.mod.Module(symbol=rec_model,
data_names=['pluids'],
label_names=['output_label'],
context=[mx.cpu()])
# enable profiler
mx.profiler.set_config(profile_symbolic=True, profile_imperative=True, profile_memory=False,
profile_api=True, filename='profile.json', aggregate_stats=True)
mx.profiler.set_state('run')
mod.fit(train_data=X_train,
num_epoch=num_epoch,
initializer=mx.init.Xavier(rnd_type="gaussian"),
optimizer='adagrad',
eval_metric=['accuracy'],
validation_metric=['accuracy', mx.metric.TopKAccuracy(3)],
eval_data=X_eval,
batch_end_callback=mx.callback.Speedometer(batch_size, 2))
mx.profiler.set_state('stop')
print(mx.profiler.dumps())
```
And the performance:
1. before optimization of _backward_Embedding:
```
operator
=================
Name Total Count Time (ms) Min Time (ms) Max Time (ms) Avg Time (ms)
---- ----------- --------- ------------- ------------- -------------
_backward_Embedding 180 2854.3340 12.3320 29.1350 15.8574
_mul_scalar 1620 527.4130 0.0030 3.3110 0.3256
_backward_FullyConnected 180 162.2140 0.7430 1.6510 0.9012
SoftmaxOutput 200 129.6620 0.1250 1.2650 0.6483
FullyConnected 200 110.0570 0.2340 42.0660 0.5503
argmax 200 49.5320 0.1840 0.4930 0.2477
broadcast_add 1080 31.0420 0.0040 2.9860 0.0287
Embedding 200 25.0530 0.0240 3.8110 0.1253
_backward_SoftmaxOutput 180 19.0860 0.0560 0.8680 0.1060
square 540 18.5240 0.0030 2.8510 0.0343
sqrt 540 17.3870 0.0060 0.9440 0.0322
DeleteVariable 3532 11.3070 0.0020 0.0330 0.0032
broadcast_sub 540 8.2790 0.0040 0.0440 0.0153
broadcast_div 540 7.4970 0.0050 0.0730 0.0139
_plus_scalar 555 6.5850 0.0040 0.0650 0.0119
SetValueOp 8 5.8160 0.0050 5.6540 0.7270
CopyCPU2CPU 448 4.1040 0.0020 0.1030 0.0092
ResourceParallelRandomSetSeed 1 3.8440 3.8440 3.8440 3.8440
WaitForVar 220 1.3150 0.0040 0.0120 0.0060
Cast 33 1.1590 0.0080 0.2680 0.0351
_random_normal 2 0.7270 0.1210 0.6060 0.3635
_zeros 6 0.2650 0.0070 0.0720 0.0442
_div_scalar 15 0.1400 0.0050 0.0190 0.0093
SetupExec 6 0.0150 0.0010 0.0060 0.0025
_full 1 0.0060 0.0060 0.0060 0.0060
```
2. after optimization
```
operator
=================
Name Total Count Time (ms) Min Time (ms) Max Time (ms) Avg Time (ms)
---- ----------- --------- ------------- ------------- -------------
_mul_scalar 1620 451.0970 0.0030 3.2960 0.2785
_backward_FullyConnected 180 195.9230 0.7440 2.3720 1.0885
SoftmaxOutput 200 156.3020 0.1080 1.2910 0.7815
FullyConnected 200 136.2320 0.2300 43.5920 0.6812
argmax 200 54.5550 0.1710 0.4960 0.2728
_backward_SoftmaxOutput 180 39.8900 0.0570 0.8930 0.2216
Embedding 200 27.0910 0.0270 3.1330 0.1355
broadcast_add 1080 24.6370 0.0040 0.6560 0.0228
_backward_Embedding 180 21.5230 0.0970 0.4120 0.1196
sqrt 540 20.1840 0.0060 0.1300 0.0374
square 540 19.2420 0.0040 2.9200 0.0356
DeleteVariable 3532 13.1160 0.0010 0.1310 0.0037
broadcast_sub 540 11.0550 0.0040 0.0980 0.0205
broadcast_div 540 9.3750 0.0050 0.1110 0.0174
_plus_scalar 555 8.2280 0.0040 0.1140 0.0148
SetValueOp 8 5.9090 0.0050 5.7620 0.7386
CopyCPU2CPU 448 4.2760 0.0030 0.1040 0.0095
ResourceParallelRandomSetSeed 1 3.8370 3.8370 3.8370 3.8370
Cast 33 1.2800 0.0090 0.2670 0.0388
WaitForVar 195 1.2160 0.0040 0.0180 0.0062
_random_normal 2 0.7190 0.1200 0.5990 0.3595
_zeros 6 0.2710 0.0060 0.0790 0.0452
_div_scalar 15 0.2610 0.0050 0.0790 0.0174
SetupExec 6 0.0150 0.0020 0.0050 0.0025
_full 1 0.0070 0.0070 0.0070 0.0070
```
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
With regards,
Apache Git Services
[GitHub] [incubator-mxnet] mxnet-bot commented on issue #17906: Optimize
AddTakeGrad Tensor Sum
Posted by GitBox <gi...@apache.org>.
mxnet-bot commented on issue #17906: Optimize AddTakeGrad Tensor Sum
URL: https://github.com/apache/incubator-mxnet/pull/17906#issuecomment-604169969
Jenkins CI successfully triggered : [windows-gpu, unix-gpu]
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
With regards,
Apache Git Services
[GitHub] [incubator-mxnet] leezu commented on issue #17906: Optimize
AddTakeGrad Tensor Sum
Posted by GitBox <gi...@apache.org>.
leezu commented on issue #17906: Optimize AddTakeGrad Tensor Sum
URL: https://github.com/apache/incubator-mxnet/pull/17906#issuecomment-604173004
@ElaineBao do you expect this to be true at other places where tensor-level summation is used? If so, how about removing the tensor-level summation feature from mshadow and fix all places in the mxnet codebase?
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
With regards,
Apache Git Services
[GitHub] [incubator-mxnet] mxnet-bot commented on issue #17906: Optimize
AddTakeGrad Tensor Sum
Posted by GitBox <gi...@apache.org>.
mxnet-bot commented on issue #17906: Optimize AddTakeGrad Tensor Sum
URL: https://github.com/apache/incubator-mxnet/pull/17906#issuecomment-603828047
Hey @ElaineBao , Thanks for submitting the PR
Once your PR is ready for CI checks, invoke the following commands:
- To trigger all jobs: @mxnet-bot run ci [all]
- To trigger specific jobs: @mxnet-bot run ci [job1, job2]
***
**CI supported jobs**: [unix-cpu, edge, centos-gpu, windows-cpu, miscellaneous, sanity, unix-gpu, windows-gpu, clang, website, centos-cpu]
***
_Note_:
Only following 3 categories can trigger CI :PR Author, MXNet Committer, Jenkins Admin.
All CI tests must pass before the PR can be merged.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
With regards,
Apache Git Services
[GitHub] [incubator-mxnet] ElaineBao commented on issue #17906: Optimize
AddTakeGrad Tensor Sum
Posted by GitBox <gi...@apache.org>.
ElaineBao commented on issue #17906: Optimize AddTakeGrad Tensor Sum
URL: https://github.com/apache/incubator-mxnet/pull/17906#issuecomment-604176384
@leezu I cannot say all tensor-level summation is slow, after all I haven't run all the cases.
But changing tensor-level summation to element-wise summation actually increases the amount of code and makes the code less readable, so if not for known efficiency issue, I think it's better to remain unchanged and using tensor-level summation.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
With regards,
Apache Git Services
[GitHub] [incubator-mxnet] ElaineBao commented on issue #17906: Optimize
AddTakeGrad Tensor Sum
Posted by GitBox <gi...@apache.org>.
ElaineBao commented on issue #17906: Optimize AddTakeGrad Tensor Sum
URL: https://github.com/apache/incubator-mxnet/pull/17906#issuecomment-604169748
> Is tensor-level summation slow as the compiler fails to optimize the mshadow implementation? Or what is the reason?
Not quite sure about the reason, but I think it may be related to temporary memory allocation
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
With regards,
Apache Git Services
[GitHub] [incubator-mxnet] ElaineBao commented on issue #17906: Optimize
AddTakeGrad Tensor Sum
Posted by GitBox <gi...@apache.org>.
ElaineBao commented on issue #17906: Optimize AddTakeGrad Tensor Sum
URL: https://github.com/apache/incubator-mxnet/pull/17906#issuecomment-604189163
@TaoLv OK, I'll work on it
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
With regards,
Apache Git Services
[GitHub] [incubator-mxnet] ElaineBao commented on issue #17906: Optimize
AddTakeGrad Tensor Sum
Posted by GitBox <gi...@apache.org>.
ElaineBao commented on issue #17906: Optimize AddTakeGrad Tensor Sum
URL: https://github.com/apache/incubator-mxnet/pull/17906#issuecomment-609135780
@mxnet-bot run ci [unix-gpu, unix-cpu]
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
With regards,
Apache Git Services
[GitHub] [incubator-mxnet] leezu merged pull request #17906: Optimize
AddTakeGrad Tensor Sum
Posted by GitBox <gi...@apache.org>.
leezu merged pull request #17906: Optimize AddTakeGrad Tensor Sum
URL: https://github.com/apache/incubator-mxnet/pull/17906
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
With regards,
Apache Git Services
[GitHub] [incubator-mxnet] mxnet-bot commented on issue #17906: Optimize
AddTakeGrad Tensor Sum
Posted by GitBox <gi...@apache.org>.
mxnet-bot commented on issue #17906: Optimize AddTakeGrad Tensor Sum
URL: https://github.com/apache/incubator-mxnet/pull/17906#issuecomment-609135956
Jenkins CI successfully triggered : [unix-gpu, unix-cpu]
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
With regards,
Apache Git Services
[GitHub] [incubator-mxnet] ElaineBao commented on issue #17906: Optimize
AddTakeGrad Tensor Sum
Posted by GitBox <gi...@apache.org>.
ElaineBao commented on issue #17906: Optimize AddTakeGrad Tensor Sum
URL: https://github.com/apache/incubator-mxnet/pull/17906#issuecomment-604169929
@mxnet-bot run ci [unix-gpu, windows-gpu]
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
With regards,
Apache Git Services
[GitHub] [incubator-mxnet] ElaineBao commented on issue #17906: Optimize
AddTakeGrad Tensor Sum
Posted by GitBox <gi...@apache.org>.
ElaineBao commented on issue #17906: Optimize AddTakeGrad Tensor Sum
URL: https://github.com/apache/incubator-mxnet/pull/17906#issuecomment-603828312
@mxnet-bot run ci [all]
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
With regards,
Apache Git Services
[GitHub] [incubator-mxnet] TaoLv commented on issue #17906: Optimize
AddTakeGrad Tensor Sum
Posted by GitBox <gi...@apache.org>.
TaoLv commented on issue #17906: Optimize AddTakeGrad Tensor Sum
URL: https://github.com/apache/incubator-mxnet/pull/17906#issuecomment-604180544
@ElaineBao Could you please share a benchmarking script so we can verify the effect of this optimization? opperf may help: https://github.com/apache/incubator-mxnet/tree/master/benchmark/opperf.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
With regards,
Apache Git Services
[GitHub] [incubator-mxnet] TaoLv commented on issue #17906: Optimize
AddTakeGrad Tensor Sum
Posted by GitBox <gi...@apache.org>.
TaoLv commented on issue #17906: Optimize AddTakeGrad Tensor Sum
URL: https://github.com/apache/incubator-mxnet/pull/17906#issuecomment-609430145
> Sorry for the late reply.
I tried to use opperf, but it doesn't work, some error throwed out when I was using it:
File "/incubator-mxnet/benchmark/opperf/rules/default_params.py", line 606, in <module>
"axis_shape": DEFAULT_AXIS_SHAPE,
NameError: name 'DEFAULT_AXIS_SHAPE' is not defined
FYI, @ChaiBapchya.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
With regards,
Apache Git Services
[GitHub] [incubator-mxnet] leezu commented on issue #17906: Optimize
AddTakeGrad Tensor Sum
Posted by GitBox <gi...@apache.org>.
leezu commented on issue #17906: Optimize AddTakeGrad Tensor Sum
URL: https://github.com/apache/incubator-mxnet/pull/17906#issuecomment-603968342
Is tensor-level summation slow as the compiler fails to optimize the mshadow implementation? Or what is the reason?
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
With regards,
Apache Git Services