You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@mxnet.apache.org by GitBox <gi...@apache.org> on 2017/12/05 06:49:28 UTC

[GitHub] taliesinb opened a new pull request #8949: New layer: split_like.

taliesinb opened a new pull request #8949: New layer: split_like.
URL: https://github.com/apache/incubator-mxnet/pull/8949
 
 
   ## Description ##
   
   This PR introduces `split_like` layer, which is similar to `reshape_like` layer. The layer takes the first dim of the input shape (lhs), and splits it into two dims, which it gets from the reference shape (rhs).
   
   This layer implements a simple operation, but this operation has a big payoff for high-level frameworks that want to do fast bucketing. If you want to do fast bucketing you have to avoid unrolling / compiling the network more than once. For GPU we have layers like `RNN` layer which make this possible. But an additional requirement is that the sequence length *never* appear in the `shape` parameters of any `Reshape` ops, otherwise we are forced to make new symbols instead of simply reshaping old executors.
   
   Currently, there is a common idiom which unfortunately requires we hardcode the sequence length into a `Reshape`. My `split_like` allows us to implement this idiom without hardcoding sequence length. The idiom I mentioned is a way to efficiently map an operation over a sequence: 
   
   1) take a tensor, with shape `in1 = (seq, batch, d1, d2, ...)`
   2) merge the first two dims with `Reshape` using `shape=(-3,-2)` to give `in2 = (seq * batch, d1, d2, ...)`
   3) perform some ordinary op or ops that naturally maps over the batch dimension to give `out1 = (seq * batch, e1, e2, ..)`
   4) split the first dimension of the output back into two to give `out2 = (seq, batch, e1, e2, ...)`
   
   `seq` doesn't have to come before `batch`, it could be the other way, but the point is the same: in step 4 we need to reverse the split operation that we performed in step 2. Unfortunately, we cannot hard-code either the batch size or the max sequence length into a `Reshape` layer, because if we do, we prevent an existing executor from being resized when we wish change that parameter. But there is no simple way to go from `seq * batch` to `seq, batch` without knowing at least one of them and hardcoding it into the `Reshape`, using current MXNet ops.
   
   split_op makes this easy: you can just use the original tensor `in1` as the reference tensor, giving us `out2 = split_layer(lhs=out1, rhs=in1)`.
   
   When you have a large number of buckets, the performance savings from doing this can be enormous. It is possible to cache the original JSON and attempt to build a template of it such that you can produce new JSON more cheaply, but that is a complicated and hacky prospect compared to having the executor be naturally resizable, which `split_layer` allows.
   
   ## Checklist ##
   ### Essentials ###
   - [ ] Passed code style checking (`make lint`)
   I get `ModuleNotFoundError: No module named 'cpplint'`, but the code is mostly copy-pasted from `reshape_like`, so it is probably ok?
   - [x] Changes are complete (i.e. I finished coding on this PR)
   - [ ] All changes have test coverage
   `split_like` is almost a direct copy of `reshape_like`, which seems to have no direct test coverage. What should I do? 
   - [x] For user-facing API changes, API doc string has been updated. For new C++ functions in header files, their functionalities and arguments are well-documented. 
   - [x] To my best knowledge, examples are either not affected by this change, or have been fixed to be compatible with this change
   
   ### Changes ###
   - [ ] Feature1, tests, (and when applicable, API doc)
   - [ ] Feature2, tests, (and when applicable, API doc)
   
   (Note sure how to fill in the above section)
   
   ## Comments ##
   This seems like the only simple way of achieving executor resizability for the idiom I mentioned, but maybe I'm missing something and it is already possible!

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services