You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@mxnet.apache.org by GitBox <gi...@apache.org> on 2019/01/30 02:55:44 UTC
[GitHub] PistonY edited a comment on issue #13709: Why FP16 training speed
is too slow on Tesla T4 in Gluon?
PistonY edited a comment on issue #13709: Why FP16 training speed is too slow on Tesla T4 in Gluon?
URL: https://github.com/apache/incubator-mxnet/issues/13709#issuecomment-458792488
I tried to use fixed input,FP32 work well but FP16 out of memory.
This is my script.
```python
from mxnet import nd, autograd
from mxnet import gluon
from mxnet.gluon import loss as gloss
from gluoncv.model_zoo import *
import mxnet as mx
import time
ctx = mx.gpu(0)
data = nd.random.normal(shape=(64, 3, 224, 224), ctx=ctx)
lable = nd.random.randint(low=0, high=1, shape=(64, 1), ctx=ctx)
net = resnet101_v2()
net.hybridize()
net.initialize(ctx=ctx)
net(data)
test_num = 500
dtype = 'float16' # float32 or float16
if dtype != 'float32':
net.cast(dtype)
Loss = gloss.SoftmaxCrossEntropyLoss()
trainer = gluon.Trainer(net.collect_params(),
'nag', {'learning_rate': 0.1, 'momentum': 0.9,
'multi_precision': True # when fp16 is enabled
})
sta = time.time()
for _ in range(test_num):
with autograd.record():
output = net(data.astype(dtype, copy=False))
loss = Loss(output, lable.astype(dtype, copy=False))
loss.backward()
trainer.step(128)
end = time.time()
print(end - sta)
```
mxnet version is 1.5.0 (--pre)
When training with FP32,it cost 9921Mb memory and 75s.
But I tested with FP16 memory usage from 7000Mb continue to grow until out of memory.
I don't know why, it's looks like memory doesn't free.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
With regards,
Apache Git Services