You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@mxnet.apache.org by GitBox <gi...@apache.org> on 2017/12/29 12:42:48 UTC

[GitHub] rlantz-cfa opened a new issue #9226: Deferred Initialization Error after a forward pass

rlantz-cfa opened a new issue #9226: Deferred Initialization Error after a forward pass
URL: https://github.com/apache/incubator-mxnet/issues/9226
 
 
   ## Description
   I'm following the Gluon Tutorial, and am attempting to build a custom object detection model by modifying the code found [here]([url](http://gluon.mxnet.io/chapter08_computer-vision/object-detection.html)).  After building out the architecture, I try running the training loop and end up with a Deferred initialization error even after I've (apparently) made a successful forward pass.
   
   ## Environment info (Required)
   Using SageMaker with the python 3.6 mxnet kernel.  The version of mxnet is 0.12.1
   
   ## Error Message:
   ```DataBatch: data shapes: [(12, 3, 400, 400)] label shapes: [(12, 51, 5)]
   I'm here
   and here
   and here too
   
   [ 0.16604483  0.12181112  0.13917704  0.12500151  0.33701715  0.10331827
     0.13935642  0.16480497  0.22345504  0.48216474  0.11068322  0.14671497]
   <NDArray 12 @cpu(0)>
   ---------------------------------------------------------------------------
   DeferredInitializationError               Traceback (most recent call last)
   <ipython-input-26-170751fb746e> in <module>()
        24             print(loss)
        25             loss.backward()
   ---> 26         trainer.step(batch_size)
        27         cls_metric.update([cls_target], [nd.transpose(class_predictions, (0, 2, 1))])
        28         box_metric.update([box_target], [box_predictions * box_mask])
   
   ~/anaconda3/envs/mxnet_p36/lib/python3.6/site-packages/mxnet/gluon/trainer.py in step(self, batch_size, ignore_stale_grad)
       160         """
       161         if not self._kv_initialized:
   --> 162             self._init_kvstore()
       163 
       164         self._optimizer.rescale_grad = self._scale / batch_size
   
   ~/anaconda3/envs/mxnet_p36/lib/python3.6/site-packages/mxnet/gluon/trainer.py in _init_kvstore(self)
       101 
       102     def _init_kvstore(self):
   --> 103         arg_arrays = {param.name: param.data(self._contexts[0]) for param in self._params}
       104         kvstore, update_on_kvstore = _create_kvstore(self._kvstore, len(self._contexts),
       105                                                      arg_arrays)
   
   ~/anaconda3/envs/mxnet_p36/lib/python3.6/site-packages/mxnet/gluon/trainer.py in <dictcomp>(.0)
       101 
       102     def _init_kvstore(self):
   --> 103         arg_arrays = {param.name: param.data(self._contexts[0]) for param in self._params}
       104         kvstore, update_on_kvstore = _create_kvstore(self._kvstore, len(self._contexts),
       105                                                      arg_arrays)
   
   ~/anaconda3/envs/mxnet_p36/lib/python3.6/site-packages/mxnet/gluon/parameter.py in data(self, ctx)
       359         NDArray on ctx
       360         """
   --> 361         return self._check_and_get(self._data, ctx)
       362 
       363     def list_data(self):
   
   ~/anaconda3/envs/mxnet_p36/lib/python3.6/site-packages/mxnet/gluon/parameter.py in _check_and_get(self, arr_list, ctx)
       161                 "Please pass one batch of data through the network before accessing Parameters. " \
       162                 "You can also avoid deferred initialization by specifying in_units, " \
   --> 163                 "num_features, etc., for network layers."%(self.name))
       164         raise RuntimeError(
       165             "Parameter %s has not been initialized. Note that " \
   
   DeferredInitializationError: Parameter toyssd1_conv2_weight has not been initialized yet because initialization was deferred. Actual initialization happens during the first forward pass. Please pass one batch of data through the network before accessing Parameters. You can also avoid deferred initialization by specifying in_units, num_features, etc., for network layers.
   ```
   
   ## Minimum reproducible example
   ```
   for epoch in range(start_epoch, epochs):
       train_data.reset()
       cls_metric.reset()
       box_metric.reset()
       tic = time.time()
       for i, batch in enumerate(train_data):
           btic = time.time()
           print(batch)
           with ag.record():
               print("I'm here")
               x = batch.data[0].as_in_context(ctx)
               y = batch.label[0].as_in_context(ctx)
               print("and here")
               default_anchors, class_predictions, box_predictions = net(x)
               box_target, box_mask, cls_target = training_targets(default_anchors, class_predictions, y)
               print("and here too")
               loss1 = cls_loss(class_predictions, cls_target)
               loss2 = box_loss(box_predictions, box_target, box_mask)
               
               loss = loss1 + loss2
               print(loss)
               loss.backward()
           trainer.step(batch_size)
           cls_metric.update([cls_target], [nd.transpose(class_predictions, (0, 2, 1))])
           box_metric.update([box_target], [box_predictions * box_mask])
           if (i + 1) % log_interval == 0:
               name1, val1 = cls_metric.get()
               name2, val2 = box_metric.get()
               print('[Epoch %d Batch %d] speed: %f samples/s, training: %s=%f, %s=%f'
                     %(epoch ,i, batch_size/(time.time()-btic), name1, val1, name2, val2))
               
       name1, val1 = cls_metric.get()
       name2, val2 = box_metric.get()
       print('[Epoch %d] training: %s=%f, %s=%f'%(epoch, name1, val1, name2, val2))
       print('[Epoch %d] time cost: %f'%(epoch, time.time()-tic))
   
   # we can save the trained parameters to disk
   net.save_params('ssd_%d.params' % epochs)
   ```
   If helpful I can paste in the network architecture as well, though it's pretty much the same as in the Gluon tutorial linked above with the small exception that my data input shape is `(400, 400)`.  
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services