You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@mxnet.apache.org by qk...@apache.org on 2017/08/03 22:22:11 UTC
[incubator-mxnet] branch master updated: [R] update docs from
mx.symbol.MakeLoss. close #2922 (#7325)
This is an automated email from the ASF dual-hosted git repository.
qkou pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/incubator-mxnet.git
The following commit(s) were added to refs/heads/master by this push:
new dd4512f [R] update docs from mx.symbol.MakeLoss. close #2922 (#7325)
dd4512f is described below
commit dd4512f82051711240adc301033e52bec7998abf
Author: Qiang Kou (KK) <qk...@qkou.info>
AuthorDate: Thu Aug 3 22:22:09 2017 +0000
[R] update docs from mx.symbol.MakeLoss. close #2922 (#7325)
---
R-package/vignettes/CustomLossFunction.Rmd | 159 +++++++++++++++++++++
docs/tutorials/r/CustomLossFunction.md | 220 ++++++++++++++++++++++++-----
2 files changed, 341 insertions(+), 38 deletions(-)
diff --git a/R-package/vignettes/CustomLossFunction.Rmd b/R-package/vignettes/CustomLossFunction.Rmd
new file mode 100644
index 0000000..1817109
--- /dev/null
+++ b/R-package/vignettes/CustomLossFunction.Rmd
@@ -0,0 +1,159 @@
+---
+title: "Customized loss function"
+output:
+ md_document:
+ variant: markdown_github
+---
+
+```{r setup, include=FALSE}
+knitr::opts_chunk$set(echo = TRUE)
+```
+
+This tutorial provides guidelines for using customized loss function in network construction.
+
+Model Training Example
+----------
+
+Let's begin with a small regression example. We can build and train a regression model with the following code:
+
+```{r}
+data(BostonHousing, package = "mlbench")
+BostonHousing[, sapply(BostonHousing, is.factor)] <-
+ as.numeric(as.character(BostonHousing[, sapply(BostonHousing, is.factor)]))
+BostonHousing <- data.frame(scale(BostonHousing))
+
+test.ind = seq(1, 506, 5) # 1 pt in 5 used for testing
+train.x = data.matrix(BostonHousing[-test.ind,-14])
+train.y = BostonHousing[-test.ind, 14]
+test.x = data.matrix(BostonHousing[--test.ind,-14])
+test.y = BostonHousing[--test.ind, 14]
+
+require(mxnet)
+
+data <- mx.symbol.Variable("data")
+label <- mx.symbol.Variable("label")
+fc1 <- mx.symbol.FullyConnected(data, num_hidden = 14, name = "fc1")
+tanh1 <- mx.symbol.Activation(fc1, act_type = "tanh", name = "tanh1")
+fc2 <- mx.symbol.FullyConnected(tanh1, num_hidden = 1, name = "fc2")
+lro <- mx.symbol.LinearRegressionOutput(fc2, name = "lro")
+
+mx.set.seed(0)
+model <- mx.model.FeedForward.create(lro, X = train.x, y = train.y,
+ ctx = mx.cpu(),
+ num.round = 5,
+ array.batch.size = 60,
+ optimizer = "rmsprop",
+ verbose = TRUE,
+ array.layout = "rowmajor",
+ batch.end.callback = NULL,
+ epoch.end.callback = NULL)
+
+pred <- predict(model, test.x)
+sum((test.y - pred[1,])^2) / length(test.y)
+```
+
+Besides the `LinearRegressionOutput`, we also provide `LogisticRegressionOutput` and `MAERegressionOutput`.
+However, this might not be enough for real-world models. You can provide your own loss function
+by using `mx.symbol.MakeLoss` when constructing the network.
+
+How to Use Your Own Loss Function
+---------
+
+We still use our previous example, but this time we use `mx.symbol.MakeLoss` to minimize the `(pred-label)^2`
+
+```{r}
+data <- mx.symbol.Variable("data")
+label <- mx.symbol.Variable("label")
+fc1 <- mx.symbol.FullyConnected(data, num_hidden = 14, name = "fc1")
+tanh1 <- mx.symbol.Activation(fc1, act_type = "tanh", name = "tanh1")
+fc2 <- mx.symbol.FullyConnected(tanh1, num_hidden = 1, name = "fc2")
+lro2 <- mx.symbol.MakeLoss(mx.symbol.square(mx.symbol.Reshape(fc2, shape = 0) - label), name="lro2")
+```
+
+Then we can train the network just as usual.
+
+```{r}
+mx.set.seed(0)
+model2 <- mx.model.FeedForward.create(lro2, X = train.x, y = train.y,
+ ctx = mx.cpu(),
+ num.round = 5,
+ array.batch.size = 60,
+ optimizer = "rmsprop",
+ verbose = TRUE,
+ array.layout = "rowmajor",
+ batch.end.callback = NULL,
+ epoch.end.callback = NULL)
+```
+
+We should get very similar results because we are actually minimizing the same loss function.
+However, the result is quite different.
+
+```{r}
+pred2 <- predict(model2, test.x)
+sum((test.y - pred2)^2) / length(test.y)
+```
+
+This is because output of `mx.symbol.MakeLoss` is the gradient of loss with respect to the input data.
+We can get the real prediction as below.
+
+```{r}
+internals = internals(model2$symbol)
+fc_symbol = internals[[match("fc2_output", outputs(internals))]]
+
+model3 <- list(symbol = fc_symbol,
+ arg.params = model2$arg.params,
+ aux.params = model2$aux.params)
+
+class(model3) <- "MXFeedForwardModel"
+
+pred3 <- predict(model3, test.x)
+sum((test.y - pred3[1,])^2) / length(test.y)
+```
+
+We have provided many operations on the symbols. An example of `|pred-label|` can be found below.
+
+```{r}
+lro_abs <- mx.symbol.MakeLoss(mx.symbol.abs(mx.symbol.Reshape(fc2, shape = 0) - label))
+mx.set.seed(0)
+model4 <- mx.model.FeedForward.create(lro_abs, X = train.x, y = train.y,
+ ctx = mx.cpu(),
+ num.round = 20,
+ array.batch.size = 60,
+ optimizer = "sgd",
+ learning.rate = 0.001,
+ verbose = TRUE,
+ array.layout = "rowmajor",
+ batch.end.callback = NULL,
+ epoch.end.callback = NULL)
+
+internals = internals(model4$symbol)
+fc_symbol = internals[[match("fc2_output", outputs(internals))]]
+
+model5 <- list(symbol = fc_symbol,
+ arg.params = model4$arg.params,
+ aux.params = model4$aux.params)
+
+class(model5) <- "MXFeedForwardModel"
+
+pred5 <- predict(model5, test.x)
+sum(abs(test.y - pred5[1,])) / length(test.y)
+```
+
+
+```{r}
+lro_mae <- mx.symbol.MAERegressionOutput(fc2, name = "lro")
+mx.set.seed(0)
+model6 <- mx.model.FeedForward.create(lro_mae, X = train.x, y = train.y,
+ ctx = mx.cpu(),
+ num.round = 20,
+ array.batch.size = 60,
+ optimizer = "sgd",
+ learning.rate = 0.001,
+ verbose = TRUE,
+ array.layout = "rowmajor",
+ batch.end.callback = NULL,
+ epoch.end.callback = NULL)
+pred6 <- predict(model6, test.x)
+sum(abs(test.y - pred6[1,])) / length(test.y)
+```
+
diff --git a/docs/tutorials/r/CustomLossFunction.md b/docs/tutorials/r/CustomLossFunction.md
index a710480..afb9951 100644
--- a/docs/tutorials/r/CustomLossFunction.md
+++ b/docs/tutorials/r/CustomLossFunction.md
@@ -3,57 +3,201 @@ Customized loss function
This tutorial provides guidelines for using customized loss function in network construction.
-
Model Training Example
-----------
+----------------------
Let's begin with a small regression example. We can build and train a regression model with the following code:
+``` r
+data(BostonHousing, package = "mlbench")
+BostonHousing[, sapply(BostonHousing, is.factor)] <-
+ as.numeric(as.character(BostonHousing[, sapply(BostonHousing, is.factor)]))
+BostonHousing <- data.frame(scale(BostonHousing))
+
+test.ind = seq(1, 506, 5) # 1 pt in 5 used for testing
+train.x = data.matrix(BostonHousing[-test.ind,-14])
+train.y = BostonHousing[-test.ind, 14]
+test.x = data.matrix(BostonHousing[--test.ind,-14])
+test.y = BostonHousing[--test.ind, 14]
+
+require(mxnet)
+```
+
+ ## Loading required package: mxnet
+
+``` r
+data <- mx.symbol.Variable("data")
+label <- mx.symbol.Variable("label")
+fc1 <- mx.symbol.FullyConnected(data, num_hidden = 14, name = "fc1")
+tanh1 <- mx.symbol.Activation(fc1, act_type = "tanh", name = "tanh1")
+fc2 <- mx.symbol.FullyConnected(tanh1, num_hidden = 1, name = "fc2")
+lro <- mx.symbol.LinearRegressionOutput(fc2, name = "lro")
+
+mx.set.seed(0)
+model <- mx.model.FeedForward.create(lro, X = train.x, y = train.y,
+ ctx = mx.cpu(),
+ num.round = 5,
+ array.batch.size = 60,
+ optimizer = "rmsprop",
+ verbose = TRUE,
+ array.layout = "rowmajor",
+ batch.end.callback = NULL,
+ epoch.end.callback = NULL)
+```
+
+ ## Start training with 1 devices
+
+``` r
+pred <- predict(model, test.x)
+```
+
+ ## Warning in mx.model.select.layout.predict(X, model): Auto detect layout of input matrix, use rowmajor..
+
+``` r
+sum((test.y - pred[1,])^2) / length(test.y)
+```
- ```r
- library(mxnet)
- data(BostonHousing, package="mlbench")
- train.ind = seq(1, 506, 3)
- train.x = data.matrix(BostonHousing[train.ind, -14])
- train.y = BostonHousing[train.ind, 14]
- test.x = data.matrix(BostonHousing[-train.ind, -14])
- test.y = BostonHousing[-train.ind, 14]
- data <- mx.symbol.Variable("data")
- fc1 <- mx.symbol.FullyConnected(data, num_hidden=1)
- lro <- mx.symbol.LinearRegressionOutput(fc1)
- mx.set.seed(0)
- model <- mx.model.FeedForward.create(
- lro, X=train.x, y=train.y,
- eval.data=list(data=test.x, label=test.y),
- ctx=mx.cpu(), num.round=10, array.batch.size=20,
- learning.rate=2e-6, momentum=0.9, eval.metric=mx.metric.rmse)
- ```
-
-Besides the `LinearRegressionOutput`, we also provide `LogisticRegressionOutput` and `MAERegressionOutput`.
-However, this might not be enough for real-world models. You can provide your own loss function
-by using `mx.symbol.MakeLoss` when constructing the network.
+ ## [1] 0.2485236
+Besides the `LinearRegressionOutput`, we also provide `LogisticRegressionOutput` and `MAERegressionOutput`. However, this might not be enough for real-world models. You can provide your own loss function by using `mx.symbol.MakeLoss` when constructing the network.
How to Use Your Own Loss Function
----------
+---------------------------------
+
+We still use our previous example, but this time we use `mx.symbol.MakeLoss` to minimize the `(pred-label)^2`
+
+``` r
+data <- mx.symbol.Variable("data")
+label <- mx.symbol.Variable("label")
+fc1 <- mx.symbol.FullyConnected(data, num_hidden = 14, name = "fc1")
+tanh1 <- mx.symbol.Activation(fc1, act_type = "tanh", name = "tanh1")
+fc2 <- mx.symbol.FullyConnected(tanh1, num_hidden = 1, name = "fc2")
+lro2 <- mx.symbol.MakeLoss(mx.symbol.square(mx.symbol.Reshape(fc2, shape = 0) - label), name="lro2")
+```
+
+Then we can train the network just as usual.
+
+``` r
+mx.set.seed(0)
+model2 <- mx.model.FeedForward.create(lro2, X = train.x, y = train.y,
+ ctx = mx.cpu(),
+ num.round = 5,
+ array.batch.size = 60,
+ optimizer = "rmsprop",
+ verbose = TRUE,
+ array.layout = "rowmajor",
+ batch.end.callback = NULL,
+ epoch.end.callback = NULL)
+```
+
+ ## Start training with 1 devices
+
+We should get very similar results because we are actually minimizing the same loss function. However, the result is quite different.
+
+``` r
+pred2 <- predict(model2, test.x)
+```
+
+ ## Warning in mx.model.select.layout.predict(X, model): Auto detect layout of input matrix, use rowmajor..
+
+``` r
+sum((test.y - pred2)^2) / length(test.y)
+```
+
+ ## [1] 1.234584
+
+This is because output of `mx.symbol.MakeLoss` is the gradient of loss with respect to the input data. We can get the real prediction as below.
+
+``` r
+internals = internals(model2$symbol)
+fc_symbol = internals[[match("fc2_output", outputs(internals))]]
+
+model3 <- list(symbol = fc_symbol,
+ arg.params = model2$arg.params,
+ aux.params = model2$aux.params)
+
+class(model3) <- "MXFeedForwardModel"
+
+pred3 <- predict(model3, test.x)
+```
+
+ ## Warning in mx.model.select.layout.predict(X, model): Auto detect layout of input matrix, use rowmajor..
+
+``` r
+sum((test.y - pred3[1,])^2) / length(test.y)
+```
+
+ ## [1] 0.248294
+
+We have provided many operations on the symbols. An example of `|pred-label|` can be found below.
+
+``` r
+lro_abs <- mx.symbol.MakeLoss(mx.symbol.abs(mx.symbol.Reshape(fc2, shape = 0) - label))
+mx.set.seed(0)
+model4 <- mx.model.FeedForward.create(lro_abs, X = train.x, y = train.y,
+ ctx = mx.cpu(),
+ num.round = 20,
+ array.batch.size = 60,
+ optimizer = "sgd",
+ learning.rate = 0.001,
+ verbose = TRUE,
+ array.layout = "rowmajor",
+ batch.end.callback = NULL,
+ epoch.end.callback = NULL)
+```
+
+ ## Start training with 1 devices
+
+``` r
+internals = internals(model4$symbol)
+fc_symbol = internals[[match("fc2_output", outputs(internals))]]
+
+model5 <- list(symbol = fc_symbol,
+ arg.params = model4$arg.params,
+ aux.params = model4$aux.params)
+
+class(model5) <- "MXFeedForwardModel"
+
+pred5 <- predict(model5, test.x)
+```
+
+ ## Warning in mx.model.select.layout.predict(X, model): Auto detect layout of input matrix, use rowmajor..
+
+``` r
+sum(abs(test.y - pred5[1,])) / length(test.y)
+```
+
+ ## [1] 0.7056902
+
+``` r
+lro_mae <- mx.symbol.MAERegressionOutput(fc2, name = "lro")
+mx.set.seed(0)
+model6 <- mx.model.FeedForward.create(lro_mae, X = train.x, y = train.y,
+ ctx = mx.cpu(),
+ num.round = 20,
+ array.batch.size = 60,
+ optimizer = "sgd",
+ learning.rate = 0.001,
+ verbose = TRUE,
+ array.layout = "rowmajor",
+ batch.end.callback = NULL,
+ epoch.end.callback = NULL)
+```
+
+ ## Start training with 1 devices
-We still use our previous example.
+``` r
+pred6 <- predict(model6, test.x)
+```
- ```r
- library(mxnet)
- data <- mx.symbol.Variable("data")
- fc1 <- mx.symbol.FullyConnected(data, num_hidden=1)
- lro <- mx.symbol.MakeLoss(mx.symbol.square(mx.symbol.Reshape(fc1, shape = 0) - label))
- ```
+ ## Warning in mx.model.select.layout.predict(X, model): Auto detect layout of input matrix, use rowmajor..
-In the last line of network definition, we do not use the predefined loss function. We define the loss
-by ourselves, which is `(pred-label)^2`.
+``` r
+sum(abs(test.y - pred6[1,])) / length(test.y)
+```
-We have provided many operations on the symbols, so you can also define `|pred-label|` using the line below.
+ ## [1] 0.7056902
- ```r
- lro <- mx.symbol.MakeLoss(mx.symbol.abs(mx.symbol.Reshape(fc1, shape = 0) - label))
- ```
## Next Steps
* [Neural Networks with MXNet in Five Minutes](http://mxnet.io/tutorials/r/fiveMinutesNeuralNetwork.html)
--
To stop receiving notification emails like this one, please contact
['"commits@mxnet.apache.org" <co...@mxnet.apache.org>'].