You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@mxnet.apache.org by GitBox <gi...@apache.org> on 2020/04/04 16:07:27 UTC
[GitHub] [incubator-mxnet] ElaineBao commented on issue #17906: Optimize
AddTakeGrad Tensor Sum
ElaineBao commented on issue #17906: Optimize AddTakeGrad Tensor Sum
URL: https://github.com/apache/incubator-mxnet/pull/17906#issuecomment-609050759
Sorry for the late reply.
I tried to use opperf, but it doesn't work, some error throwed out when I was using it:
```
# File "/incubator-mxnet/benchmark/opperf/rules/default_params.py", line 606, in <module>
# "axis_shape": DEFAULT_AXIS_SHAPE,
# NameError: name 'DEFAULT_AXIS_SHAPE' is not defined
```
So I use mxnet profiler to validate the performance, I think it's also reasonable.
The script is as follows:
```
import random
import pandas as pd
import mxnet as mx
import numpy as np
from sklearn.model_selection import train_test_split
batch_size = 1000
num_epoch = 5
model_prefix = 'drivethru_attention_d'
n_plus= 522
total = 40000
profiling = True
records = []
for i in range(0, total):
pluids = [random.randint(0, n_plus - 1) for i in range(0, 5)]
label = random.randint(0, 1)
records.append((pluids, label))
data = pd.DataFrame(records,
columns=['pluids','label'])
train, test = train_test_split(data, test_size=0.1, random_state=100)
X_train = mx.io.NDArrayIter(data={'pluids': np.array(train['pluids'].values.tolist(), dtype=int)},
label={'output_label': train['label'].values},
batch_size=batch_size,
shuffle=True)
X_eval = mx.io.NDArrayIter(data={'pluids': np.array(test['pluids'].values.tolist(), dtype=int)},
label={'output_label': test['label'].values},
batch_size=batch_size,
shuffle=True)
y_true = mx.symbol.Variable('output_label')
pluids = mx.symbol.Variable('pluids')
plu_embed = mx.symbol.Embedding(data=pluids, input_dim=n_plus, output_dim=50, name='plu_embed')
fc1 = mx.symbol.FullyConnected(data=plu_embed, num_hidden=int(n_plus), name='fc1')
rec_model = mx.symbol.SoftmaxOutput(data=fc1, label=y_true, name='output')
mod = mx.mod.Module(symbol=rec_model,
data_names=['pluids'],
label_names=['output_label'],
context=[mx.cpu()])
# enable profiler
mx.profiler.set_config(profile_symbolic=True, profile_imperative=True, profile_memory=False,
profile_api=True, filename='profile.json', aggregate_stats=True)
mx.profiler.set_state('run')
mod.fit(train_data=X_train,
num_epoch=num_epoch,
initializer=mx.init.Xavier(rnd_type="gaussian"),
optimizer='adagrad',
eval_metric=['accuracy'],
validation_metric=['accuracy', mx.metric.TopKAccuracy(3)],
eval_data=X_eval,
batch_end_callback=mx.callback.Speedometer(batch_size, 2))
mx.profiler.set_state('stop')
print(mx.profiler.dumps())
```
And the performance:
1. before optimization of _backward_Embedding:
```
operator
=================
Name Total Count Time (ms) Min Time (ms) Max Time (ms) Avg Time (ms)
---- ----------- --------- ------------- ------------- -------------
_backward_Embedding 180 2854.3340 12.3320 29.1350 15.8574
_mul_scalar 1620 527.4130 0.0030 3.3110 0.3256
_backward_FullyConnected 180 162.2140 0.7430 1.6510 0.9012
SoftmaxOutput 200 129.6620 0.1250 1.2650 0.6483
FullyConnected 200 110.0570 0.2340 42.0660 0.5503
argmax 200 49.5320 0.1840 0.4930 0.2477
broadcast_add 1080 31.0420 0.0040 2.9860 0.0287
Embedding 200 25.0530 0.0240 3.8110 0.1253
_backward_SoftmaxOutput 180 19.0860 0.0560 0.8680 0.1060
square 540 18.5240 0.0030 2.8510 0.0343
sqrt 540 17.3870 0.0060 0.9440 0.0322
DeleteVariable 3532 11.3070 0.0020 0.0330 0.0032
broadcast_sub 540 8.2790 0.0040 0.0440 0.0153
broadcast_div 540 7.4970 0.0050 0.0730 0.0139
_plus_scalar 555 6.5850 0.0040 0.0650 0.0119
SetValueOp 8 5.8160 0.0050 5.6540 0.7270
CopyCPU2CPU 448 4.1040 0.0020 0.1030 0.0092
ResourceParallelRandomSetSeed 1 3.8440 3.8440 3.8440 3.8440
WaitForVar 220 1.3150 0.0040 0.0120 0.0060
Cast 33 1.1590 0.0080 0.2680 0.0351
_random_normal 2 0.7270 0.1210 0.6060 0.3635
_zeros 6 0.2650 0.0070 0.0720 0.0442
_div_scalar 15 0.1400 0.0050 0.0190 0.0093
SetupExec 6 0.0150 0.0010 0.0060 0.0025
_full 1 0.0060 0.0060 0.0060 0.0060
```
2. after optimization
```
operator
=================
Name Total Count Time (ms) Min Time (ms) Max Time (ms) Avg Time (ms)
---- ----------- --------- ------------- ------------- -------------
_mul_scalar 1620 451.0970 0.0030 3.2960 0.2785
_backward_FullyConnected 180 195.9230 0.7440 2.3720 1.0885
SoftmaxOutput 200 156.3020 0.1080 1.2910 0.7815
FullyConnected 200 136.2320 0.2300 43.5920 0.6812
argmax 200 54.5550 0.1710 0.4960 0.2728
_backward_SoftmaxOutput 180 39.8900 0.0570 0.8930 0.2216
Embedding 200 27.0910 0.0270 3.1330 0.1355
broadcast_add 1080 24.6370 0.0040 0.6560 0.0228
_backward_Embedding 180 21.5230 0.0970 0.4120 0.1196
sqrt 540 20.1840 0.0060 0.1300 0.0374
square 540 19.2420 0.0040 2.9200 0.0356
DeleteVariable 3532 13.1160 0.0010 0.1310 0.0037
broadcast_sub 540 11.0550 0.0040 0.0980 0.0205
broadcast_div 540 9.3750 0.0050 0.1110 0.0174
_plus_scalar 555 8.2280 0.0040 0.1140 0.0148
SetValueOp 8 5.9090 0.0050 5.7620 0.7386
CopyCPU2CPU 448 4.2760 0.0030 0.1040 0.0095
ResourceParallelRandomSetSeed 1 3.8370 3.8370 3.8370 3.8370
Cast 33 1.2800 0.0090 0.2670 0.0388
WaitForVar 195 1.2160 0.0040 0.0180 0.0062
_random_normal 2 0.7190 0.1200 0.5990 0.3595
_zeros 6 0.2710 0.0060 0.0790 0.0452
_div_scalar 15 0.2610 0.0050 0.0790 0.0174
SetupExec 6 0.0150 0.0020 0.0050 0.0025
_full 1 0.0070 0.0070 0.0070 0.0070
```
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
With regards,
Apache Git Services