You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@mxnet.apache.org by GitBox <gi...@apache.org> on 2017/12/29 12:42:48 UTC
[GitHub] rlantz-cfa opened a new issue #9226: Deferred Initialization Error after a forward pass
rlantz-cfa opened a new issue #9226: Deferred Initialization Error after a forward pass
URL: https://github.com/apache/incubator-mxnet/issues/9226
## Description
I'm following the Gluon Tutorial, and am attempting to build a custom object detection model by modifying the code found [here]([url](http://gluon.mxnet.io/chapter08_computer-vision/object-detection.html)). After building out the architecture, I try running the training loop and end up with a Deferred initialization error even after I've (apparently) made a successful forward pass.
## Environment info (Required)
Using SageMaker with the python 3.6 mxnet kernel. The version of mxnet is 0.12.1
## Error Message:
```DataBatch: data shapes: [(12, 3, 400, 400)] label shapes: [(12, 51, 5)]
I'm here
and here
and here too
[ 0.16604483 0.12181112 0.13917704 0.12500151 0.33701715 0.10331827
0.13935642 0.16480497 0.22345504 0.48216474 0.11068322 0.14671497]
<NDArray 12 @cpu(0)>
---------------------------------------------------------------------------
DeferredInitializationError Traceback (most recent call last)
<ipython-input-26-170751fb746e> in <module>()
24 print(loss)
25 loss.backward()
---> 26 trainer.step(batch_size)
27 cls_metric.update([cls_target], [nd.transpose(class_predictions, (0, 2, 1))])
28 box_metric.update([box_target], [box_predictions * box_mask])
~/anaconda3/envs/mxnet_p36/lib/python3.6/site-packages/mxnet/gluon/trainer.py in step(self, batch_size, ignore_stale_grad)
160 """
161 if not self._kv_initialized:
--> 162 self._init_kvstore()
163
164 self._optimizer.rescale_grad = self._scale / batch_size
~/anaconda3/envs/mxnet_p36/lib/python3.6/site-packages/mxnet/gluon/trainer.py in _init_kvstore(self)
101
102 def _init_kvstore(self):
--> 103 arg_arrays = {param.name: param.data(self._contexts[0]) for param in self._params}
104 kvstore, update_on_kvstore = _create_kvstore(self._kvstore, len(self._contexts),
105 arg_arrays)
~/anaconda3/envs/mxnet_p36/lib/python3.6/site-packages/mxnet/gluon/trainer.py in <dictcomp>(.0)
101
102 def _init_kvstore(self):
--> 103 arg_arrays = {param.name: param.data(self._contexts[0]) for param in self._params}
104 kvstore, update_on_kvstore = _create_kvstore(self._kvstore, len(self._contexts),
105 arg_arrays)
~/anaconda3/envs/mxnet_p36/lib/python3.6/site-packages/mxnet/gluon/parameter.py in data(self, ctx)
359 NDArray on ctx
360 """
--> 361 return self._check_and_get(self._data, ctx)
362
363 def list_data(self):
~/anaconda3/envs/mxnet_p36/lib/python3.6/site-packages/mxnet/gluon/parameter.py in _check_and_get(self, arr_list, ctx)
161 "Please pass one batch of data through the network before accessing Parameters. " \
162 "You can also avoid deferred initialization by specifying in_units, " \
--> 163 "num_features, etc., for network layers."%(self.name))
164 raise RuntimeError(
165 "Parameter %s has not been initialized. Note that " \
DeferredInitializationError: Parameter toyssd1_conv2_weight has not been initialized yet because initialization was deferred. Actual initialization happens during the first forward pass. Please pass one batch of data through the network before accessing Parameters. You can also avoid deferred initialization by specifying in_units, num_features, etc., for network layers.
```
## Minimum reproducible example
```
for epoch in range(start_epoch, epochs):
train_data.reset()
cls_metric.reset()
box_metric.reset()
tic = time.time()
for i, batch in enumerate(train_data):
btic = time.time()
print(batch)
with ag.record():
print("I'm here")
x = batch.data[0].as_in_context(ctx)
y = batch.label[0].as_in_context(ctx)
print("and here")
default_anchors, class_predictions, box_predictions = net(x)
box_target, box_mask, cls_target = training_targets(default_anchors, class_predictions, y)
print("and here too")
loss1 = cls_loss(class_predictions, cls_target)
loss2 = box_loss(box_predictions, box_target, box_mask)
loss = loss1 + loss2
print(loss)
loss.backward()
trainer.step(batch_size)
cls_metric.update([cls_target], [nd.transpose(class_predictions, (0, 2, 1))])
box_metric.update([box_target], [box_predictions * box_mask])
if (i + 1) % log_interval == 0:
name1, val1 = cls_metric.get()
name2, val2 = box_metric.get()
print('[Epoch %d Batch %d] speed: %f samples/s, training: %s=%f, %s=%f'
%(epoch ,i, batch_size/(time.time()-btic), name1, val1, name2, val2))
name1, val1 = cls_metric.get()
name2, val2 = box_metric.get()
print('[Epoch %d] training: %s=%f, %s=%f'%(epoch, name1, val1, name2, val2))
print('[Epoch %d] time cost: %f'%(epoch, time.time()-tic))
# we can save the trained parameters to disk
net.save_params('ssd_%d.params' % epochs)
```
If helpful I can paste in the network architecture as well, though it's pretty much the same as in the Gluon tutorial linked above with the small exception that my data input shape is `(400, 400)`.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
With regards,
Apache Git Services