You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@mxnet.apache.org by GitBox <gi...@apache.org> on 2019/05/01 06:13:12 UTC
[GitHub] [incubator-mxnet] Ian8888 opened a new issue #14854: Create
Resnet152 training
Ian8888 opened a new issue #14854: Create Resnet152 training
URL: https://github.com/apache/incubator-mxnet/issues/14854
I can't training Resnet152 by vision.get_model
My code is
```
resnet152 = vision.get_model('resnet152_v2', classes =2)(mx.sym.var('data'))
mod = mx.mod.Module(symbol=resnet152,context=mx.gpu())
mod.bind(data_shapes = train_dataiter.provide_data, label_shapes = None)
mod.init_params(mx.init.Xavier(factor_type = "in", magnitude = 2.34))
kv = mx.kvstore.create('local')
op = mx.optimizer.create('adam', rescale_grad = (1.0 / batch_size), lr_scheduler = mx.lr_scheduler.FactorScheduler(step = int(epoch_size * 6), factor = 0.88, stop_factor_lr = 5e-15), learning_rate = lr, beta1 = 0.9, wd = 0.00001)
checkpoint = mx.callback.do_checkpoint(Model_save_dir + Model_save_name)
mod.fit(train_dataiter, test_dataiter, num_epoch = 280, batch_end_callback = mx.callback.Speedometer(batch_size, 100), kvstore = kv, optimizer = op, epoch_end_callback = checkpoint)
```
The error message is
> MXNetError Traceback (most recent call last)
> <ipython-input-71-ee679e7da4b6> in <module>()
> 26 checkpoint = mx.callback.do_checkpoint(Model_save_dir + Model_save_name)
> 27
> ---> 28 mod.fit(train_dataiter, test_dataiter, num_epoch = 280, batch_end_callback = mx.callback.Speedometer(batch_size, 100), kvstore = kv, optimizer = op, epoch_end_callback = checkpoint)
>
> /home/mxnetcv309ii/.local/lib/python2.7/site-packages/mxnet/module/base_module.pyc in fit(self, train_data, eval_data, eval_metric, epoch_end_callback, batch_end_callback, kvstore, optimizer, optimizer_params, eval_end_callback, eval_batch_end_callback, initializer, arg_params, aux_params, allow_missing, force_rebind, force_init, begin_epoch, num_epoch, validation_metric, monitor, sparse_row_id_fn)
> 526 if monitor is not None:
> 527 monitor.tic()
> --> 528 self.forward_backward(data_batch)
> 529 self.update()
> 530
>
> /home/mxnetcv309ii/.local/lib/python2.7/site-packages/mxnet/module/base_module.pyc in forward_backward(self, data_batch)
> 195 """A convenient function that calls both ``forward`` and ``backward``."""
> 196 self.forward(data_batch, is_train=True)
> --> 197 self.backward()
> 198
> 199 def score(self, eval_data, eval_metric, num_batch=None, batch_end_callback=None,
>
> /home/mxnetcv309ii/.local/lib/python2.7/site-packages/mxnet/module/module.pyc in backward(self, out_grads)
> 640 """
> 641 assert self.binded and self.params_initialized
> --> 642 self._exec_group.backward(out_grads=out_grads)
> 643
> 644 def update(self):
>
> /home/mxnetcv309ii/.local/lib/python2.7/site-packages/mxnet/module/executor_group.pyc in backward(self, out_grads)
> 597 else:
> 598 out_grads_slice.append(grad.copyto(self.contexts[i]))
> --> 599 exec_.backward(out_grads=out_grads_slice)
> 600
> 601 def update_metric(self, eval_metric, labels, pre_sliced):
>
> /home/mxnetcv309ii/.local/lib/python2.7/site-packages/mxnet/executor.pyc in backward(self, out_grads, is_train)
> 233 mx_uint(len(out_grads)),
> 234 ndarray,
> --> 235 ctypes.c_int(is_train)))
> 236
> 237 def set_monitor_callback(self, callback):
>
> /home/mxnetcv309ii/.local/lib/python2.7/site-packages/mxnet/base.pyc in check_call(ret)
> 250 """
> 251 if ret != 0:
> --> 252 raise MXNetError(py_str(_LIB.MXGetLastError()))
> 253
> 254
>
> MXNetError: [13:54:23] src/executor/graph_executor.cc:82: Check failed: i < head_grads.size() && !head_grads[i].is_none() Because the last operator is not Loss function, head_gradient is required when calling backward. If you are attempting to minimize the output as an objective, please modify your network and pass it through the make_loss symbol.
>
> Stack trace returned 10 entries:
> [bt] (0) /home/mxnetcv309ii/.local/lib/python2.7/site-packages/mxnet/libmxnet.so(+0x3f23c2) [0x7fbfa675d3c2]
> [bt] (1) /home/mxnetcv309ii/.local/lib/python2.7/site-packages/mxnet/libmxnet.so(+0x3f2988) [0x7fbfa675d988]
> [bt] (2) /home/mxnetcv309ii/.local/lib/python2.7/site-packages/mxnet/libmxnet.so(mxnet::exec::GraphExecutor::Backward(std::vector<mxnet::NDArray, std::allocator<mxnet::NDArray> > const&, bool)+0x1ba) [0x7fbfa944348a]
> [bt] (3) /home/mxnetcv309ii/.local/lib/python2.7/site-packages/mxnet/libmxnet.so(MXExecutorBackwardEx+0x1ef) [0x7fbfa93aaa1f]
> [bt] (4) /usr/lib/x86_64-linux-gnu/libffi.so.6(ffi_call_unix64+0x4c) [0x7fc01b763e40]
> [bt] (5) /usr/lib/x86_64-linux-gnu/libffi.so.6(ffi_call+0x2eb) [0x7fc01b7638ab]
> [bt] (6) /usr/lib/python2.7/lib-dynload/_ctypes.x86_64-linux-gnu.so(_ctypes_callproc+0x48f) [0x7fc01b9733df]
> [bt] (7) /usr/lib/python2.7/lib-dynload/_ctypes.x86_64-linux-gnu.so(+0x11d82) [0x7fc01b977d82]
> [bt] (8) /usr/bin/python(PyEval_EvalFrameEx+0x578d) [0x4c166d]
> [bt] (9) /usr/bin/python(PyEval_EvalCodeEx+0x306) [0x4b9b66]
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
With regards,
Apache Git Services