You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@mxnet.apache.org by gi...@git.apache.org on 2017/08/24 09:18:40 UTC
[GitHub] SherlockLiao opened a new issue #7593: why gluon is slower than PyTorch?
SherlockLiao opened a new issue #7593: why gluon is slower than PyTorch?
URL: https://github.com/apache/incubator-mxnet/issues/7593
For bugs or installation issues, please provide the following information.
The more information you provide, the more likely people will be able to help you.
## Environment info
Operating System: 16.04.2 LTS
Compiler:
Package used (Python/R/Scala/Julia): python
MXNet version: mxnet-cu80 0.11
Or if installed from source:
MXNet commit hash (`git rev-parse HEAD`):
If you are using python package, please provide
Python version and distribution:
If you are using R package, please provide
R `sessionInfo()`:
## Error Message:
I think Gluon should be faster than pytorch, or at least the same speed. But I write a small network, lenet using gluon and pytorch. The hyperparameters are same. I run 20 epochs, and the total time of pytorch is 69.515576 s, but time of the gluon is 175.097399 s. It seems gluon is much slower than pytorch. I don't know if I write gluon code in a wrong way.
Here is my code of two version.
### Pytorch
```python
import torch
import torchvision
from torch.utils.data import DataLoader
from torchvision.datasets import MNIST
from torchvision import transforms
from torch.autograd import Variable
from torch import optim
import torch.nn as nn
import torch.nn.functional as F
import time
learning_rate = 1e-3
batch_size = 64
epoches = 20
trans_img = transforms.ToTensor()
trainset = MNIST('./data', train=True, transform=trans_img)
testset = MNIST('./data', train=False, transform=trans_img)
trainloader = DataLoader(
trainset, batch_size=batch_size, shuffle=True, num_workers=4)
testloader = DataLoader(
testset, batch_size=batch_size, shuffle=False, num_workers=4)
# build network
class Lenet(nn.Module):
def __init__(self):
super(Lenet, self).__init__()
self.conv = nn.Sequential(
nn.Conv2d(1, 6, 3, stride=1, padding=1),
nn.MaxPool2d(2, 2),
nn.Conv2d(6, 16, 5, stride=1, padding=0), nn.MaxPool2d(2, 2))
self.fc = nn.Sequential(
nn.Linear(400, 120), nn.Linear(120, 84), nn.Linear(84, 10))
def forward(self, x):
out = self.conv(x)
out = out.view(out.size(0), -1)
out = self.fc(out)
return out
lenet = Lenet()
lenet.cuda()
criterian = nn.CrossEntropyLoss(size_average=False)
optimizer = optim.SGD(lenet.parameters(), lr=learning_rate)
# train
start = time.time()
for i in range(epoches):
running_loss = 0.
running_acc = 0.
for (img, label) in trainloader:
img = Variable(img).cuda()
label = Variable(label).cuda()
optimizer.zero_grad()
output = lenet(img)
loss = criterian(output, label)
# backward
loss.backward()
optimizer.step()
running_loss += loss.data[0]
_, predict = torch.max(output, 1)
correct_num = (predict == label).sum()
running_acc += correct_num.data[0]
running_loss /= len(trainset)
running_acc /= len(trainset)
print("[%d/%d] Loss: %.5f, Acc: %.2f" % (i + 1, epoches, running_loss,
100 * running_acc))
print('Time {:.6f}'.format(time.time() - start))
```
### gluon
```python
import time
import mxnet as mx
import mxnet.gluon as g
import numpy as np
# define hyperparameters
batch_size = 64
learning_rate = 1e-3
epochs = 20
step = 300
ctx = mx.gpu()
# define data transform
def data_transform(data, label):
return mx.nd.transpose(data.astype(np.float32) / 255,
(2, 0, 1)), label.astype(np.float32)
# define dataset and dataloader
train_dataset = g.data.vision.MNIST(transform=data_transform)
test_dataset = g.data.vision.MNIST(train=False, transform=data_transform)
train_loader = g.data.DataLoader(
train_dataset, batch_size=batch_size, shuffle=True)
test_loader = g.data.DataLoader(
test_dataset, batch_size=batch_size, shuffle=False)
# define model
lenet = g.nn.Sequential(prefix='lenet_')
with lenet.name_scope():
lenet.add(g.nn.Conv2D(6, 3, strides=1, padding=1))
lenet.add(g.nn.MaxPool2D(2, 2))
lenet.add(g.nn.Conv2D(16, 5, strides=1))
lenet.add(g.nn.MaxPool2D(2, 2))
lenet.add(g.nn.Flatten())
lenet.add(g.nn.Dense(120))
lenet.add(g.nn.Dense(84))
lenet.add(g.nn.Dense(10))
lenet.collect_params().initialize(mx.init.Xavier(), ctx=ctx)
criterion = g.loss.SoftmaxCrossEntropyLoss()
optimizer = g.Trainer(lenet.collect_params(), 'sgd',
{'learning_rate': learning_rate})
# start train
start = time.time()
for e in range(epochs):
print('*' * 10)
print('epoch {}'.format(e + 1))
moving_loss = 0.0
moving_acc = 0.0
for i, (img, label) in enumerate(train_loader, 1):
img = img.as_in_context(ctx)
label = label.as_in_context(ctx)
with g.autograd.record():
output = lenet(img)
loss = criterion(output, label)
loss.backward()
optimizer.step(img.shape[0])
# =========== keep average loss and accuracy ==============
moving_loss += mx.nd.mean(loss).asscalar()
predict = mx.nd.argmax(output, axis=1)
acc = mx.nd.mean(predict == label).asscalar()
moving_acc += acc
if i % step == 0:
print('[{}/{}] Loss: {:.6f}, Acc: {:.6f}'.format(
i, len(train_loader), moving_loss / step, moving_acc / step))
moving_loss = 0.0
moving_acc = 0.0
print('Time {:.6f} s'.format(time.time() - start))
```
## Minimum reproducible example
if you are using your own code, please provide a short script that reproduces the error.
## Steps to reproduce
or if you are running standard examples, please provide the commands you have run that lead to the error.
1.
2.
3.
## What have you tried to solve it?
1.
2.
3.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
With regards,
Apache Git Services