You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@mahout.apache.org by "Christian Herta (Created) (JIRA)" <ji...@apache.org> on 2012/02/07 12:54:59 UTC
[jira] [Created] (MAHOUT-975) Bug in Gradient Machine -
Computation of the gradient
Bug in Gradient Machine - Computation of the gradient
------------------------------------------------------
Key: MAHOUT-975
URL: https://issues.apache.org/jira/browse/MAHOUT-975
Project: Mahout
Issue Type: Bug
Components: Classification
Affects Versions: 0.7
Reporter: Christian Herta
The initialisation to compute the gradient descent weight updates for the output units should be wrong:
In the comment: "dy / dw is just w since y = x' * w + b."
This is wrong. dy/dw is x (ignoring the indices). The same initialisation is done in the code.
Check by using neural network terminology:
The gradient machine is a specialized version of a multi layer perceptron (MLP).
In a MLP the gradient for computing the "weight change" for the output units is:
dE / dw_ij = dE / dz_i * dz_i / d_ij with z_i = sum_j (w_ij * a_j)
here: i index of the output layer; j index of the hidden layer
(d stands for the partial derivatives)
here: z_i = a_i (no squashing in the output layer)
with the special loss (cost function) is E = 1 - a_g + a_b = 1 - z_g + z_b
with
g index of output unit with target value: +1 (positive class)
b: random output unit with target value: 0
=>
dE / dw_gj = -dE/dz_g * dz_g/dw_gj = -1 * a_j (a_j: activity of the hidden unit
j)
dE / dw_bj = -dE/dz_b * dz_b/dw_bj = +1 * a_j (a_j: activity of the hidden unit
j)
That's the same if the comment would be correct:
dy /dw = x (x is here the activation of the hidden unit) * (-1) for weights to
the output unit with target value +1.
------------
In neural network implementations it's common to compute the gradient
numerically for a test of the implementation. This can be done by:
dE/dw_ij = (E(w_ij + epsilon) -E(w_ij - epsilon) ) / (2* (epsilon))
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAHOUT-975) Bug in Gradient Machine -
Computation of the gradient
Posted by "Lance Norskog (Commented) (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/MAHOUT-975?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13206660#comment-13206660 ]
Lance Norskog commented on MAHOUT-975:
--------------------------------------
The newest patch does not compile against the trunk. There is a singular/plural problem with one of the variables.
I have tested this with the SGD classification example/bin/classify-20newsgroups.sh. The total accuracy dropped from 71% to 62%. The SGD example for Apache emails (subset of commons v.s. cocoon) does not work well, so I can't evaluate it with that.
Can you suggest a public dataset where this change works better than the trunk?
> Bug in Gradient Machine - Computation of the gradient
> ------------------------------------------------------
>
> Key: MAHOUT-975
> URL: https://issues.apache.org/jira/browse/MAHOUT-975
> Project: Mahout
> Issue Type: Bug
> Components: Classification
> Affects Versions: 0.7
> Reporter: Christian Herta
> Attachments: GradientMachine.patch
>
>
> The initialisation to compute the gradient descent weight updates for the output units should be wrong:
>
> In the comment: "dy / dw is just w since y = x' * w + b."
> This is wrong. dy/dw is x (ignoring the indices). The same initialisation is done in the code.
> Check by using neural network terminology:
> The gradient machine is a specialized version of a multi layer perceptron (MLP).
> In a MLP the gradient for computing the "weight change" for the output units is:
> dE / dw_ij = dE / dz_i * dz_i / d_ij with z_i = sum_j (w_ij * a_j)
> here: i index of the output layer; j index of the hidden layer
> (d stands for the partial derivatives)
> here: z_i = a_i (no squashing in the output layer)
> with the special loss (cost function) is E = 1 - a_g + a_b = 1 - z_g + z_b
> with
> g index of output unit with target value: +1 (positive class)
> b: random output unit with target value: 0
> =>
> dE / dw_gj = -dE/dz_g * dz_g/dw_gj = -1 * a_j (a_j: activity of the hidden unit
> j)
> dE / dw_bj = -dE/dz_b * dz_b/dw_bj = +1 * a_j (a_j: activity of the hidden unit
> j)
> That's the same if the comment would be correct:
> dy /dw = x (x is here the activation of the hidden unit) * (-1) for weights to
> the output unit with target value +1.
> ------------
> In neural network implementations it's common to compute the gradient
> numerically for a test of the implementation. This can be done by:
> dE/dw_ij = (E(w_ij + epsilon) -E(w_ij - epsilon) ) / (2* (epsilon))
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
Re: [jira] [Commented] (MAHOUT-975) Bug in Gradient Machine - Computation of the gradient
Posted by Isabel Drost <is...@apache.org>.
First of all: Christian, welcome to Mahout :)
> Sorry, for the singular plural problem. I thought I compiled the code
> before I produced the patch. I was not familiar in producing patches for
> mahout.
I'm currently w/o connectivity but there is a documentation on generating
patches for Mahout on our wiki page (click on documentation from our main page
to get to the wiki and search for patches there).
> Because the gradient machine is a very specialised version of a mlp, I
> started a more general implementation: MAHOUT-976. The sgd version of the
> mlp seems to be quite stable. But, I just use it (at moment) to learn an
> autoencoder.
Agreeing with Hector that if your implementation is more general you really
should just change the existing version instead of creating a new
implementation.
> I going to go in holidays for 2 weeks without a computer. Then I will
> compare the bug fix of the Gradient Machine with the original version with
> training/test data.
Looking forward to your comparison.
Isabel
[jira] [Commented] (MAHOUT-975) Bug in Gradient Machine -
Computation of the gradient
Posted by "Christian Herta (Commented) (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/MAHOUT-975?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13217540#comment-13217540 ]
Christian Herta commented on MAHOUT-975:
----------------------------------------
Sorry, for the singular plural problem. I thought I compiled the code before I produced the patch. I was not familiar in producing patches for mahout.
I just studied the code theoretical, because I was interested if there was a Multi Layer Perceptron (mlp) implementation in Mahout. But I am quite sure that the calculation of the gradient is not correct (see bug description).
Because the gradient machine is a very specialised version of a mlp, I started a more general implementation: MAHOUT-976. The sgd version of the mlp seems to be quite stable. But, I just use it (at moment) to learn an autoencoder.
I going to go in holidays for 2 weeks without a computer. Then I will compare the bug fix of the Gradient Machine with the original version with training/test data.
> Bug in Gradient Machine - Computation of the gradient
> ------------------------------------------------------
>
> Key: MAHOUT-975
> URL: https://issues.apache.org/jira/browse/MAHOUT-975
> Project: Mahout
> Issue Type: Bug
> Components: Classification
> Affects Versions: 0.7
> Reporter: Christian Herta
> Attachments: GradientMachine.patch
>
>
> The initialisation to compute the gradient descent weight updates for the output units should be wrong:
>
> In the comment: "dy / dw is just w since y = x' * w + b."
> This is wrong. dy/dw is x (ignoring the indices). The same initialisation is done in the code.
> Check by using neural network terminology:
> The gradient machine is a specialized version of a multi layer perceptron (MLP).
> In a MLP the gradient for computing the "weight change" for the output units is:
> dE / dw_ij = dE / dz_i * dz_i / d_ij with z_i = sum_j (w_ij * a_j)
> here: i index of the output layer; j index of the hidden layer
> (d stands for the partial derivatives)
> here: z_i = a_i (no squashing in the output layer)
> with the special loss (cost function) is E = 1 - a_g + a_b = 1 - z_g + z_b
> with
> g index of output unit with target value: +1 (positive class)
> b: random output unit with target value: 0
> =>
> dE / dw_gj = -dE/dz_g * dz_g/dw_gj = -1 * a_j (a_j: activity of the hidden unit
> j)
> dE / dw_bj = -dE/dz_b * dz_b/dw_bj = +1 * a_j (a_j: activity of the hidden unit
> j)
> That's the same if the comment would be correct:
> dy /dw = x (x is here the activation of the hidden unit) * (-1) for weights to
> the output unit with target value +1.
> ------------
> In neural network implementations it's common to compute the gradient
> numerically for a test of the implementation. This can be done by:
> dE/dw_ij = (E(w_ij + epsilon) -E(w_ij - epsilon) ) / (2* (epsilon))
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAHOUT-975) Bug in Gradient Machine -
Computation of the gradient
Posted by "Hector Yee (Commented) (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/MAHOUT-975?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13217609#comment-13217609 ]
Hector Yee commented on MAHOUT-975:
-----------------------------------
If MLP is a more general implementation why don't you just delete Gradient Machine?
> Bug in Gradient Machine - Computation of the gradient
> ------------------------------------------------------
>
> Key: MAHOUT-975
> URL: https://issues.apache.org/jira/browse/MAHOUT-975
> Project: Mahout
> Issue Type: Bug
> Components: Classification
> Affects Versions: 0.7
> Reporter: Christian Herta
> Attachments: GradientMachine.patch
>
>
> The initialisation to compute the gradient descent weight updates for the output units should be wrong:
>
> In the comment: "dy / dw is just w since y = x' * w + b."
> This is wrong. dy/dw is x (ignoring the indices). The same initialisation is done in the code.
> Check by using neural network terminology:
> The gradient machine is a specialized version of a multi layer perceptron (MLP).
> In a MLP the gradient for computing the "weight change" for the output units is:
> dE / dw_ij = dE / dz_i * dz_i / d_ij with z_i = sum_j (w_ij * a_j)
> here: i index of the output layer; j index of the hidden layer
> (d stands for the partial derivatives)
> here: z_i = a_i (no squashing in the output layer)
> with the special loss (cost function) is E = 1 - a_g + a_b = 1 - z_g + z_b
> with
> g index of output unit with target value: +1 (positive class)
> b: random output unit with target value: 0
> =>
> dE / dw_gj = -dE/dz_g * dz_g/dw_gj = -1 * a_j (a_j: activity of the hidden unit
> j)
> dE / dw_bj = -dE/dz_b * dz_b/dw_bj = +1 * a_j (a_j: activity of the hidden unit
> j)
> That's the same if the comment would be correct:
> dy /dw = x (x is here the activation of the hidden unit) * (-1) for weights to
> the output unit with target value +1.
> ------------
> In neural network implementations it's common to compute the gradient
> numerically for a test of the implementation. This can be done by:
> dE/dw_ij = (E(w_ij + epsilon) -E(w_ij - epsilon) ) / (2* (epsilon))
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (MAHOUT-975) Bug in Gradient Machine -
Computation of the gradient
Posted by "Christian Herta (Updated) (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/MAHOUT-975?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Christian Herta updated MAHOUT-975:
-----------------------------------
Attachment: GradientMachine.patch
> Bug in Gradient Machine - Computation of the gradient
> ------------------------------------------------------
>
> Key: MAHOUT-975
> URL: https://issues.apache.org/jira/browse/MAHOUT-975
> Project: Mahout
> Issue Type: Bug
> Components: Classification
> Affects Versions: 0.7
> Reporter: Christian Herta
> Attachments: GradientMachine.patch
>
>
> The initialisation to compute the gradient descent weight updates for the output units should be wrong:
>
> In the comment: "dy / dw is just w since y = x' * w + b."
> This is wrong. dy/dw is x (ignoring the indices). The same initialisation is done in the code.
> Check by using neural network terminology:
> The gradient machine is a specialized version of a multi layer perceptron (MLP).
> In a MLP the gradient for computing the "weight change" for the output units is:
> dE / dw_ij = dE / dz_i * dz_i / d_ij with z_i = sum_j (w_ij * a_j)
> here: i index of the output layer; j index of the hidden layer
> (d stands for the partial derivatives)
> here: z_i = a_i (no squashing in the output layer)
> with the special loss (cost function) is E = 1 - a_g + a_b = 1 - z_g + z_b
> with
> g index of output unit with target value: +1 (positive class)
> b: random output unit with target value: 0
> =>
> dE / dw_gj = -dE/dz_g * dz_g/dw_gj = -1 * a_j (a_j: activity of the hidden unit
> j)
> dE / dw_bj = -dE/dz_b * dz_b/dw_bj = +1 * a_j (a_j: activity of the hidden unit
> j)
> That's the same if the comment would be correct:
> dy /dw = x (x is here the activation of the hidden unit) * (-1) for weights to
> the output unit with target value +1.
> ------------
> In neural network implementations it's common to compute the gradient
> numerically for a test of the implementation. This can be done by:
> dE/dw_ij = (E(w_ij + epsilon) -E(w_ij - epsilon) ) / (2* (epsilon))
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (MAHOUT-975) Bug in Gradient Machine -
Computation of the gradient
Posted by "Christian Herta (Updated) (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/MAHOUT-975?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Christian Herta updated MAHOUT-975:
-----------------------------------
Status: Open (was: Patch Available)
> Bug in Gradient Machine - Computation of the gradient
> ------------------------------------------------------
>
> Key: MAHOUT-975
> URL: https://issues.apache.org/jira/browse/MAHOUT-975
> Project: Mahout
> Issue Type: Bug
> Components: Classification
> Affects Versions: 0.7
> Reporter: Christian Herta
>
> The initialisation to compute the gradient descent weight updates for the output units should be wrong:
>
> In the comment: "dy / dw is just w since y = x' * w + b."
> This is wrong. dy/dw is x (ignoring the indices). The same initialisation is done in the code.
> Check by using neural network terminology:
> The gradient machine is a specialized version of a multi layer perceptron (MLP).
> In a MLP the gradient for computing the "weight change" for the output units is:
> dE / dw_ij = dE / dz_i * dz_i / d_ij with z_i = sum_j (w_ij * a_j)
> here: i index of the output layer; j index of the hidden layer
> (d stands for the partial derivatives)
> here: z_i = a_i (no squashing in the output layer)
> with the special loss (cost function) is E = 1 - a_g + a_b = 1 - z_g + z_b
> with
> g index of output unit with target value: +1 (positive class)
> b: random output unit with target value: 0
> =>
> dE / dw_gj = -dE/dz_g * dz_g/dw_gj = -1 * a_j (a_j: activity of the hidden unit
> j)
> dE / dw_bj = -dE/dz_b * dz_b/dw_bj = +1 * a_j (a_j: activity of the hidden unit
> j)
> That's the same if the comment would be correct:
> dy /dw = x (x is here the activation of the hidden unit) * (-1) for weights to
> the output unit with target value +1.
> ------------
> In neural network implementations it's common to compute the gradient
> numerically for a test of the implementation. This can be done by:
> dE/dw_ij = (E(w_ij + epsilon) -E(w_ij - epsilon) ) / (2* (epsilon))
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (MAHOUT-975) Bug in Gradient Machine -
Computation of the gradient
Posted by "Christian Herta (Updated) (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/MAHOUT-975?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Christian Herta updated MAHOUT-975:
-----------------------------------
Status: Patch Available (was: Open)
Patch for Mahout-975
> Bug in Gradient Machine - Computation of the gradient
> ------------------------------------------------------
>
> Key: MAHOUT-975
> URL: https://issues.apache.org/jira/browse/MAHOUT-975
> Project: Mahout
> Issue Type: Bug
> Components: Classification
> Affects Versions: 0.7
> Reporter: Christian Herta
>
> The initialisation to compute the gradient descent weight updates for the output units should be wrong:
>
> In the comment: "dy / dw is just w since y = x' * w + b."
> This is wrong. dy/dw is x (ignoring the indices). The same initialisation is done in the code.
> Check by using neural network terminology:
> The gradient machine is a specialized version of a multi layer perceptron (MLP).
> In a MLP the gradient for computing the "weight change" for the output units is:
> dE / dw_ij = dE / dz_i * dz_i / d_ij with z_i = sum_j (w_ij * a_j)
> here: i index of the output layer; j index of the hidden layer
> (d stands for the partial derivatives)
> here: z_i = a_i (no squashing in the output layer)
> with the special loss (cost function) is E = 1 - a_g + a_b = 1 - z_g + z_b
> with
> g index of output unit with target value: +1 (positive class)
> b: random output unit with target value: 0
> =>
> dE / dw_gj = -dE/dz_g * dz_g/dw_gj = -1 * a_j (a_j: activity of the hidden unit
> j)
> dE / dw_bj = -dE/dz_b * dz_b/dw_bj = +1 * a_j (a_j: activity of the hidden unit
> j)
> That's the same if the comment would be correct:
> dy /dw = x (x is here the activation of the hidden unit) * (-1) for weights to
> the output unit with target value +1.
> ------------
> In neural network implementations it's common to compute the gradient
> numerically for a test of the implementation. This can be done by:
> dE/dw_ij = (E(w_ij + epsilon) -E(w_ij - epsilon) ) / (2* (epsilon))
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (MAHOUT-975) Bug in Gradient Machine -
Computation of the gradient
Posted by "Christian Herta (Updated) (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/MAHOUT-975?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Christian Herta updated MAHOUT-975:
-----------------------------------
Status: Patch Available (was: Open)
There is a patch as an attachment of MAHOUT-975
The code has only changed little. Also I have restructured the code and put some comments in it.
> Bug in Gradient Machine - Computation of the gradient
> ------------------------------------------------------
>
> Key: MAHOUT-975
> URL: https://issues.apache.org/jira/browse/MAHOUT-975
> Project: Mahout
> Issue Type: Bug
> Components: Classification
> Affects Versions: 0.7
> Reporter: Christian Herta
> Attachments: GradientMachine.patch
>
>
> The initialisation to compute the gradient descent weight updates for the output units should be wrong:
>
> In the comment: "dy / dw is just w since y = x' * w + b."
> This is wrong. dy/dw is x (ignoring the indices). The same initialisation is done in the code.
> Check by using neural network terminology:
> The gradient machine is a specialized version of a multi layer perceptron (MLP).
> In a MLP the gradient for computing the "weight change" for the output units is:
> dE / dw_ij = dE / dz_i * dz_i / d_ij with z_i = sum_j (w_ij * a_j)
> here: i index of the output layer; j index of the hidden layer
> (d stands for the partial derivatives)
> here: z_i = a_i (no squashing in the output layer)
> with the special loss (cost function) is E = 1 - a_g + a_b = 1 - z_g + z_b
> with
> g index of output unit with target value: +1 (positive class)
> b: random output unit with target value: 0
> =>
> dE / dw_gj = -dE/dz_g * dz_g/dw_gj = -1 * a_j (a_j: activity of the hidden unit
> j)
> dE / dw_bj = -dE/dz_b * dz_b/dw_bj = +1 * a_j (a_j: activity of the hidden unit
> j)
> That's the same if the comment would be correct:
> dy /dw = x (x is here the activation of the hidden unit) * (-1) for weights to
> the output unit with target value +1.
> ------------
> In neural network implementations it's common to compute the gradient
> numerically for a test of the implementation. This can be done by:
> dE/dw_ij = (E(w_ij + epsilon) -E(w_ij - epsilon) ) / (2* (epsilon))
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira