You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@mxnet.apache.org by GitBox <gi...@apache.org> on 2019/05/01 06:13:12 UTC

[GitHub] [incubator-mxnet] Ian8888 opened a new issue #14854: Create Resnet152 training

Ian8888 opened a new issue #14854: Create Resnet152 training 
URL: https://github.com/apache/incubator-mxnet/issues/14854
 
 
   I can't training Resnet152 by vision.get_model
   My code is
   ```
   resnet152 = vision.get_model('resnet152_v2', classes =2)(mx.sym.var('data'))
   
   mod = mx.mod.Module(symbol=resnet152,context=mx.gpu())
   mod.bind(data_shapes = train_dataiter.provide_data, label_shapes = None)
   mod.init_params(mx.init.Xavier(factor_type = "in", magnitude = 2.34))
   kv = mx.kvstore.create('local')
   op = mx.optimizer.create('adam', rescale_grad = (1.0 / batch_size), lr_scheduler = mx.lr_scheduler.FactorScheduler(step = int(epoch_size * 6), factor = 0.88, stop_factor_lr = 5e-15),  learning_rate = lr, beta1 = 0.9, wd = 0.00001)
   checkpoint = mx.callback.do_checkpoint(Model_save_dir + Model_save_name)
   mod.fit(train_dataiter, test_dataiter, num_epoch = 280, batch_end_callback = mx.callback.Speedometer(batch_size, 100), kvstore = kv, optimizer = op, epoch_end_callback = checkpoint)
   ```
   
   The error message is
   
   > MXNetError                                Traceback (most recent call last)
   > <ipython-input-71-ee679e7da4b6> in <module>()
   >      26 checkpoint = mx.callback.do_checkpoint(Model_save_dir + Model_save_name)
   >      27 
   > ---> 28 mod.fit(train_dataiter, test_dataiter, num_epoch = 280, batch_end_callback = mx.callback.Speedometer(batch_size, 100), kvstore = kv, optimizer = op, epoch_end_callback = checkpoint)
   > 
   > /home/mxnetcv309ii/.local/lib/python2.7/site-packages/mxnet/module/base_module.pyc in fit(self, train_data, eval_data, eval_metric, epoch_end_callback, batch_end_callback, kvstore, optimizer, optimizer_params, eval_end_callback, eval_batch_end_callback, initializer, arg_params, aux_params, allow_missing, force_rebind, force_init, begin_epoch, num_epoch, validation_metric, monitor, sparse_row_id_fn)
   >     526                 if monitor is not None:
   >     527                     monitor.tic()
   > --> 528                 self.forward_backward(data_batch)
   >     529                 self.update()
   >     530 
   > 
   > /home/mxnetcv309ii/.local/lib/python2.7/site-packages/mxnet/module/base_module.pyc in forward_backward(self, data_batch)
   >     195         """A convenient function that calls both ``forward`` and ``backward``."""
   >     196         self.forward(data_batch, is_train=True)
   > --> 197         self.backward()
   >     198 
   >     199     def score(self, eval_data, eval_metric, num_batch=None, batch_end_callback=None,
   > 
   > /home/mxnetcv309ii/.local/lib/python2.7/site-packages/mxnet/module/module.pyc in backward(self, out_grads)
   >     640         """
   >     641         assert self.binded and self.params_initialized
   > --> 642         self._exec_group.backward(out_grads=out_grads)
   >     643 
   >     644     def update(self):
   > 
   > /home/mxnetcv309ii/.local/lib/python2.7/site-packages/mxnet/module/executor_group.pyc in backward(self, out_grads)
   >     597                 else:
   >     598                     out_grads_slice.append(grad.copyto(self.contexts[i]))
   > --> 599             exec_.backward(out_grads=out_grads_slice)
   >     600 
   >     601     def update_metric(self, eval_metric, labels, pre_sliced):
   > 
   > /home/mxnetcv309ii/.local/lib/python2.7/site-packages/mxnet/executor.pyc in backward(self, out_grads, is_train)
   >     233             mx_uint(len(out_grads)),
   >     234             ndarray,
   > --> 235             ctypes.c_int(is_train)))
   >     236 
   >     237     def set_monitor_callback(self, callback):
   > 
   > /home/mxnetcv309ii/.local/lib/python2.7/site-packages/mxnet/base.pyc in check_call(ret)
   >     250     """
   >     251     if ret != 0:
   > --> 252         raise MXNetError(py_str(_LIB.MXGetLastError()))
   >     253 
   >     254 
   > 
   > MXNetError: [13:54:23] src/executor/graph_executor.cc:82: Check failed: i < head_grads.size() && !head_grads[i].is_none() Because the last operator is not Loss function, head_gradient is required when calling backward. If you are attempting to minimize the output as an objective, please modify your network and pass it through the make_loss symbol.
   > 
   > Stack trace returned 10 entries:
   > [bt] (0) /home/mxnetcv309ii/.local/lib/python2.7/site-packages/mxnet/libmxnet.so(+0x3f23c2) [0x7fbfa675d3c2]
   > [bt] (1) /home/mxnetcv309ii/.local/lib/python2.7/site-packages/mxnet/libmxnet.so(+0x3f2988) [0x7fbfa675d988]
   > [bt] (2) /home/mxnetcv309ii/.local/lib/python2.7/site-packages/mxnet/libmxnet.so(mxnet::exec::GraphExecutor::Backward(std::vector<mxnet::NDArray, std::allocator<mxnet::NDArray> > const&, bool)+0x1ba) [0x7fbfa944348a]
   > [bt] (3) /home/mxnetcv309ii/.local/lib/python2.7/site-packages/mxnet/libmxnet.so(MXExecutorBackwardEx+0x1ef) [0x7fbfa93aaa1f]
   > [bt] (4) /usr/lib/x86_64-linux-gnu/libffi.so.6(ffi_call_unix64+0x4c) [0x7fc01b763e40]
   > [bt] (5) /usr/lib/x86_64-linux-gnu/libffi.so.6(ffi_call+0x2eb) [0x7fc01b7638ab]
   > [bt] (6) /usr/lib/python2.7/lib-dynload/_ctypes.x86_64-linux-gnu.so(_ctypes_callproc+0x48f) [0x7fc01b9733df]
   > [bt] (7) /usr/lib/python2.7/lib-dynload/_ctypes.x86_64-linux-gnu.so(+0x11d82) [0x7fc01b977d82]
   > [bt] (8) /usr/bin/python(PyEval_EvalFrameEx+0x578d) [0x4c166d]
   > [bt] (9) /usr/bin/python(PyEval_EvalCodeEx+0x306) [0x4b9b66]

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services