You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@mxnet.apache.org by GitBox <gi...@apache.org> on 2018/12/13 07:09:14 UTC
[GitHub] marekjg opened a new issue #13634: Slow CPU inference in Gluon GRU
module
marekjg opened a new issue #13634: Slow CPU inference in Gluon GRU module
URL: https://github.com/apache/incubator-mxnet/issues/13634
## Description
Gluon.GRU is slow on the CPU comparing to ndarray.RNN GRU for the same input.
## Environment info
Deep Learning AMI 19, Tesla V100
```
----------Python Info----------
Version : 3.7.1
Compiler : GCC 7.3.0
Build : ('default', 'Oct 23 2018 19:19:42')
Arch : ('64bit', '')
------------Pip Info-----------
Version : 18.1
Directory : /home/ec2-user/anaconda3/envs/gmarek_mx13/lib/python3.7/site-packages/pip
----------MXNet Info-----------
Version : 1.5.0
Directory : /home/ec2-user/anaconda3/envs/gmarek_mx13/lib/python3.7/site-packages/mxnet
Commit Hash : b45e1273ece8eba1a011107ce12032af58efe661
----------System Info----------
Platform : Linux-4.14.77-70.59.amzn1.x86_64-x86_64-with-glibc2.10
system : Linux
node : ip-172-31-44-214
release : 4.14.77-70.59.amzn1.x86_64
version : #1 SMP Mon Nov 12 22:02:45 UTC 2018
----------Hardware Info----------
machine : x86_64
processor : x86_64
Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Byte Order: Little Endian
CPU(s): 8
On-line CPU(s) list: 0-7
Thread(s) per core: 2
Core(s) per socket: 4
Socket(s): 1
NUMA node(s): 1
Vendor ID: GenuineIntel
CPU family: 6
Model: 79
Model name: Intel(R) Xeon(R) CPU E5-2686 v4 @ 2.30GHz
Stepping: 1
CPU MHz: 2701.073
BogoMIPS: 4600.18
Hypervisor vendor: Xen
Virtualization type: full
L1d cache: 32K
L1i cache: 32K
L2 cache: 256K
L3 cache: 46080K
NUMA node0 CPU(s): 0-7
----------Network Test----------
Setting timeout: 10
Timing for MXNet: https://github.com/apache/incubator-mxnet, DNS: 0.0018 sec, LOAD: 0.7860 sec.
Timing for Gluon Tutorial(en): http://gluon.mxnet.io, DNS: 0.0006 sec, LOAD: 0.5938 sec.
Timing for Gluon Tutorial(cn): https://zh.gluon.ai, DNS: 0.0006 sec, LOAD: 0.0175 sec.
Timing for FashionMNIST: https://apache-mxnet.s3-accelerate.dualstack.amazonaws.com/gluon/dataset/fashion-mnist/train-labels-idx1-ubyte.gz, DNS: 0.0004 sec, LOAD: 1.0119 sec.
Timing for PYPI: https://pypi.python.org/pypi/pip, DNS: 0.0114 sec, LOAD: 0.4352 sec.
Timing for Conda: https://repo.continuum.io/pkgs/free/, DNS: 0.0004 sec, LOAD: 0.0866 sec.
```
## Minimum reproducible example
```
from time import time
import mxnet as mx
from mxnet import nd
from mxnet import gluon
from mxnet.gluon import rnn
inp_dim = 1024
hid_dim = 1024
n_layers = 1
n_parameters = (inp_dim * hid_dim + hid_dim + hid_dim * hid_dim + hid_dim) * 3
n_steps = 100
for ctx in [mx.cpu(), mx.gpu()]:
gru_params = nd.random.uniform(low=-1, high=1, shape=(n_parameters,), ctx=ctx)
gru_ndarray = lambda x, h_0: nd.RNN(x, gru_params, h_0, num_layers=n_layers,
state_size=hid_dim, mode='gru', state_outputs=True)
gru_gluon = rnn.GRU(hid_dim, n_layers, input_size=inp_dim)
gru_gluon.collect_params().initialize(ctx=ctx)
gru_gluon.hybridize()
x = nd.random_normal(0, 1, (1, 1, inp_dim), ctx=ctx)
h_0 = x
# JIC: warm-up
_, _ = gru_gluon(x, h_0)
nd.waitall()
for method, gru in [('ndarray', gru_ndarray), ('gluon', gru_gluon)]:
h = h_0
start = time()
for step in range(n_steps):
_, h = gru(x, h)
if method == 'gluon':
h = h[0]
nd.waitall()
dt = time() - start
print(ctx, method, dt)
```
## Steps to reproduce
Run the above script with python
## Output
Gluon.GRU is significantly slower than ndarray.RNN
device,method,time:
cpu(0) ndarray 0.07194805145263672
cpu(0) gluon 4.735473394393921
gpu(0) ndarray 0.013593673706054688
gpu(0) gluon 0.04437994956970215
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
With regards,
Apache Git Services