You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@mxnet.apache.org by gi...@git.apache.org on 2017/08/24 09:18:40 UTC

[GitHub] SherlockLiao opened a new issue #7593: why gluon is slower than PyTorch?

SherlockLiao opened a new issue #7593: why gluon is slower than PyTorch?
URL: https://github.com/apache/incubator-mxnet/issues/7593
 
 
   For bugs or installation issues, please provide the following information.
   The more information you provide, the more likely people will be able to help you.
   
   ## Environment info
   Operating System: 16.04.2 LTS
   
   Compiler:
   
   Package used (Python/R/Scala/Julia): python
   
   MXNet version: mxnet-cu80 0.11
   
   Or if installed from source:
   
   MXNet commit hash (`git rev-parse HEAD`):
   
   If you are using python package, please provide
   
   Python version and distribution:
   
   If you are using R package, please provide
   
   R `sessionInfo()`:
   
   ## Error Message:
   I think Gluon should be faster than pytorch, or at least the same speed. But I write a small network, lenet using gluon and pytorch. The hyperparameters are same. I run 20 epochs, and the total time of pytorch is 69.515576 s, but time of the gluon is 175.097399 s. It seems gluon is much slower than pytorch. I don't know if I write gluon code in a wrong way.
   
   Here is my code of two version.
   
   ### Pytorch
   ```python
   import torch
   import torchvision
   from torch.utils.data import DataLoader
   from torchvision.datasets import MNIST
   from torchvision import transforms
   from torch.autograd import Variable
   from torch import optim
   import torch.nn as nn
   import torch.nn.functional as F
   import time
   
   learning_rate = 1e-3
   batch_size = 64
   epoches = 20
   
   trans_img = transforms.ToTensor()
   
   trainset = MNIST('./data', train=True, transform=trans_img)
   testset = MNIST('./data', train=False, transform=trans_img)
   
   trainloader = DataLoader(
       trainset, batch_size=batch_size, shuffle=True, num_workers=4)
   testloader = DataLoader(
       testset, batch_size=batch_size, shuffle=False, num_workers=4)
   
   
   # build network
   class Lenet(nn.Module):
       def __init__(self):
           super(Lenet, self).__init__()
           self.conv = nn.Sequential(
               nn.Conv2d(1, 6, 3, stride=1, padding=1),
               nn.MaxPool2d(2, 2),
               nn.Conv2d(6, 16, 5, stride=1, padding=0), nn.MaxPool2d(2, 2))
   
           self.fc = nn.Sequential(
               nn.Linear(400, 120), nn.Linear(120, 84), nn.Linear(84, 10))
   
       def forward(self, x):
           out = self.conv(x)
           out = out.view(out.size(0), -1)
           out = self.fc(out)
           return out
   
   
   lenet = Lenet()
   lenet.cuda()
   
   criterian = nn.CrossEntropyLoss(size_average=False)
   optimizer = optim.SGD(lenet.parameters(), lr=learning_rate)
   
   # train
   start = time.time()
   for i in range(epoches):
       running_loss = 0.
       running_acc = 0.
       for (img, label) in trainloader:
           img = Variable(img).cuda()
           label = Variable(label).cuda()
   
           optimizer.zero_grad()
           output = lenet(img)
           loss = criterian(output, label)
           # backward
           loss.backward()
           optimizer.step()
   
           running_loss += loss.data[0]
           _, predict = torch.max(output, 1)
           correct_num = (predict == label).sum()
           running_acc += correct_num.data[0]
   
       running_loss /= len(trainset)
       running_acc /= len(trainset)
       print("[%d/%d] Loss: %.5f, Acc: %.2f" % (i + 1, epoches, running_loss,
                                                100 * running_acc))
   
   print('Time {:.6f}'.format(time.time() - start))
   ```
   
   ### gluon
   ```python
   import time
   
   import mxnet as mx
   import mxnet.gluon as g
   import numpy as np
   
   # define hyperparameters
   batch_size = 64
   learning_rate = 1e-3
   epochs = 20
   step = 300
   ctx = mx.gpu()
   
   
   # define data transform
   def data_transform(data, label):
       return mx.nd.transpose(data.astype(np.float32) / 255,
                              (2, 0, 1)), label.astype(np.float32)
   
   
   # define dataset and dataloader
   train_dataset = g.data.vision.MNIST(transform=data_transform)
   test_dataset = g.data.vision.MNIST(train=False, transform=data_transform)
   
   train_loader = g.data.DataLoader(
       train_dataset, batch_size=batch_size, shuffle=True)
   test_loader = g.data.DataLoader(
       test_dataset, batch_size=batch_size, shuffle=False)
   
   # define model
   lenet = g.nn.Sequential(prefix='lenet_')
   with lenet.name_scope():
       lenet.add(g.nn.Conv2D(6, 3, strides=1, padding=1))
       lenet.add(g.nn.MaxPool2D(2, 2))
       lenet.add(g.nn.Conv2D(16, 5, strides=1))
       lenet.add(g.nn.MaxPool2D(2, 2))
       lenet.add(g.nn.Flatten())
       lenet.add(g.nn.Dense(120))
       lenet.add(g.nn.Dense(84))
       lenet.add(g.nn.Dense(10))
   
   lenet.collect_params().initialize(mx.init.Xavier(), ctx=ctx)
   
   criterion = g.loss.SoftmaxCrossEntropyLoss()
   optimizer = g.Trainer(lenet.collect_params(), 'sgd',
                         {'learning_rate': learning_rate})
   
   # start train
   start = time.time()
   for e in range(epochs):
       print('*' * 10)
       print('epoch {}'.format(e + 1))
       moving_loss = 0.0
       moving_acc = 0.0
       for i, (img, label) in enumerate(train_loader, 1):
           img = img.as_in_context(ctx)
           label = label.as_in_context(ctx)
           with g.autograd.record():
               output = lenet(img)
               loss = criterion(output, label)
           loss.backward()
           optimizer.step(img.shape[0])
           # =========== keep average loss and accuracy ==============
           moving_loss += mx.nd.mean(loss).asscalar()
           predict = mx.nd.argmax(output, axis=1)
           acc = mx.nd.mean(predict == label).asscalar()
           moving_acc += acc
   
           if i % step == 0:
               print('[{}/{}] Loss: {:.6f}, Acc: {:.6f}'.format(
                   i, len(train_loader), moving_loss / step, moving_acc / step))
               moving_loss = 0.0
               moving_acc = 0.0
   print('Time {:.6f} s'.format(time.time() - start))
   ```
   ## Minimum reproducible example
   if you are using your own code, please provide a short script that reproduces the error.
   
   ## Steps to reproduce
   or if you are running standard examples, please provide the commands you have run that lead to the error.
   
   1.
   2.
   3.
   
   ## What have you tried to solve it?
   
   1.
   2.
   3.
   
 
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services