You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@mxnet.apache.org by GitBox <gi...@apache.org> on 2018/11/15 21:41:02 UTC
[GitHub] Wallart commented on issue #13037: ImageDetIter looping forever in MXNet-1.3.0

Wallart commented on issue #13037: ImageDetIter looping forever in MXNet-1.3.0
URL: https://github.com/apache/incubator-mxnet/issues/13037#issuecomment-439200178
 
 
   I'm running my code on nvidia-docker containers (Ubuntu 17.10) with CUDA 9.2. I compiled each of my MXNet version from scratch with opencv and mkldnn support.
   On the hardware side I'm using 2 GTX 1080Ti.
   
   If we consider the following code snippet, using a dataset in ImageRecord format (approximately 200 images) :
    
   `cls_loss = FocalLoss()
       box_loss = SmoothL1Loss()
   
       cls_metric = mx.metric.Accuracy()
       box_metric = mx.metric.MAE()
   
       for epoch in range(start_epoch, epochs):
           # reset iterator and tick
           train_data.reset()
           cls_metric.reset()
           box_metric.reset()
           epoch_tick = time.time()
   
           # iterate through all batch
           for i, batch in enumerate(train_data):
               batch_tick = time.time()
   
               # record gradients
               with autograd.record():
                   x = batch.data[0].as_in_context(ctx)
                   y = batch.label[0].as_in_context(ctx)
   
                   default_anchors, class_predictions, box_predictions = net(x)
                   box_target, box_mask, cls_target = training_targets(default_anchors, class_predictions, y)
   
                   # losses
                   loss1 = cls_loss(class_predictions, cls_target)
                   loss2 = box_loss(box_predictions, box_target, box_mask)
   
                   # sum all losses
                   loss = loss1 + loss2
   
                   # backpropagate
                   loss.backward()
   
               # apply
               trainer.step(batch_size)
   
               # update metrics
               cls_metric.update([cls_target], [nd.transpose(class_predictions, (0, 2, 1))])
               box_metric.update([box_target], [box_predictions * box_mask])
   
               if (i + 1) % log_interval == 0:
                   name1, val1 = cls_metric.get()
                   name2, val2 = box_metric.get()
                   print('[Epoch %d Batch %d] speed: %f samples/s, training: %s=%f, %s=%f'
                         % (epoch, i, batch_size / (time.time() - batch_tick), name1, val1, name2, val2))
   
           # end of epoch logging
           name1, val1 = cls_metric.get()
           name2, val2 = box_metric.get()
           print('[Epoch %d] training: %s=%f, %s=%f' % (epoch, name1, val1, name2, val2))
           print('[Epoch %d] time cost: %f' % (epoch, time.time() - epoch_tick))`
   
   On MXNet 1.2.1 it will work as expected and the epochs will keeps flowing through the console
   > 
   [Epoch 0] training: accuracy=0.833192, mae=0.004929
   [Epoch 0] time cost: 1.240091
   [Epoch 1] training: accuracy=0.966545, mae=0.004379
   [Epoch 1] time cost: 0.610014
   [Epoch 2] training: accuracy=0.976884, mae=0.003983
   [Epoch 2] time cost: 0.631764
   [Epoch 3] training: accuracy=0.983173, mae=0.004638
   > 
   
   But on MXNet 1.3.0 an epoch will be divided to an infinite range of batches
   [Epoch 0 Batch 19] speed: 1155.356185 samples/s, training: accuracy=0.923830, mae=0.004783
   [Epoch 0 Batch 39] speed: 1105.710115 samples/s, training: accuracy=0.954663, mae=0.004561
   [Epoch 0 Batch 59] speed: 1169.286568 samples/s, training: accuracy=0.966536, mae=0.004413
   [Epoch 0 Batch 79] speed: 1132.142250 samples/s, training: accuracy=0.973061, mae=0.004393
   [Epoch 0 Batch 99] speed: 1115.432219 samples/s, training: accuracy=0.977253, mae=0.004304
   [Epoch 0 Batch 119] speed: 1139.079420 samples/s, training: accuracy=0.980220, mae=0.004205
   > 

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services