You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@mxnet.apache.org by "Leo Dirac (JIRA)" <ji...@apache.org> on 2018/10/15 17:21:00 UTC
[jira] [Commented] (MXNET-978) Support Higher Order Derivative

    [ https://issues.apache.org/jira/browse/MXNET-978?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16650503#comment-16650503 ] 

Leo Dirac commented on MXNET-978:
---------------------------------

There's a lot of work to fully support this for all operators.  I think we should pick a philosophy for prioritizing that allows real applications to be built as quickly as possible. There are lots of applications when this works, like better optimization algorithms, GAN training, RL algorithms, neural architecture search, etc. Each of these require having the second derivative for every op in a network. So we should pick useful network architectures where we can get the 2nd derivative for the entire network.

In order to get something working as quickly as possible, that implies to me starting with the simplest useful network architectures, and then moving towards progressively more complex architectures ordered by how useful/important they are. This makes me think the order should be approximately:
 * Fully-connected feedforward networks (multi-layer perceptron MLP)
 * CNN. Starting with AlexNet (simple) and then adding ResNet (common) and similar
 * RNN. Start with simple stacked RNN, then add LSTM, GRU, encoder-decoder, attention, transformer, etc
 * Everything else

Something like that for order of architecture types. But I do think it makes sense to start with MLP since that's the easiest way to get an end-to-end example working, and does cover some interesting real-world use cases. Also MLP requires a pretty short list of ops. I think it's basically:
 * Fully-connected layer (vector-matrix product)
 * Softmax output (most common output)
 * ReLU activation (most common, also trivial 2nd derivative)
 * Dropout (not required for MLP, but very commonly used)
 * Batch-Norm (not required for MLP, but quite useful in my experience to keep the optimization well conditioned)
 * Anything else?

 

> Support Higher Order Derivative
> -------------------------------
>
>                 Key: MXNET-978
>                 URL: https://issues.apache.org/jira/browse/MXNET-978
>             Project: Apache MXNet
>          Issue Type: Epic
>          Components: Apache MXNet Backend
>            Reporter: Lin Yuan
>            Priority: Major
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@mxnet.apache.org
For additional commands, e-mail: issues-help@mxnet.apache.org