You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@mahout.apache.org by "Herta, Christian" <Ch...@htw-berlin.de> on 2012/02/06 15:38:43 UTC
Bug in Gradient Machine?

Hello,


yesterday I checked the code of the gradient machine to understand what's going
on there. I think I found a bug in the computation of the gradient (trunk):
 
In the commentit's written: "dy / dw is just w since  y = x' * w + b."

This is wrong. dy/dw_ is x (ignoring the indices). The same is done in the code.
See the corrected version below.


----

The gradient machine is a specialized version of a multi layer perceptron (MLP).
In a MLP the gradient for computing the "weight change" for the output units is:


 
dE / dw_ij = dE / dz_i * dz_i / d_ij with z_i = sum_j (w_ij * a_j)
here: i index of the output layer; j index of the hidden layer

(d stands for the partial derivatives)


here: z_i = a_i (no squashing in the output layer)



with the special loss (cost function) is  E = 1 - a_g + a_b = 1 - z_g + z_b



with

g index of output unit with target value: +1 (positive class)

b: random output unit with target value: 0



=>

dE / dw_gj = -dE/dz_g * dz_g/dw_gj = -1 * a_j (a_j: activity of the hidden unit
j)
dE / dw_bj = -dE/dz_b * dz_b/dw_bj = +1 * a_j (a_j: activity of the hidden unit
j)


That's the same if the comment would be correct:
dy /dw = x (x is here the activation of the hidden unit) * (-1) for weights to
the output unit with target value +1.



In neural network implementations it's common to compute the gradient
numerically for a test of the implementation. This can be done by:
dE/dw_ij = (E(w_ij + epsilon) -E(w_ij - epsilon) ) / (2* (epsilon))

 
 
Cheers
 
Christian 
 
-----------------------------------


      // Note from the loss above the gradient dloss/dy , y being the label is
-1 for good
      // and +1 for bad.
      // dy / dw is just x since  y = z' * w + b.
      // Hence by the chain rule, dloss / dw_ij = dloss / dy_i * dy_i / dw_ij =
-z_j (for j=g).
      // For the regularization part, 0.5 * lambda * w' w, the gradient is
lambda * w.
      // dy / db = 1.

      // gradient descent update of the weights to the
      // positive (should-be) output-unit
      Vector gradGood = hiddenActivations.clone();
      gradGood.assign(Functions.NEGATE);
      gradGood.assign(Functions.mult(-learningRate * (1.0 -
regularization)));
      outputWeights[good].assign(gradGood, Functions.PLUS);
      outputBias.setQuick(good, outputBias.get(good) + learningRate);

      // gradient descent update of the weights to the
      // (random) negative (should-be) output-unit
      Vector gradBad = hiddenActivations.clone();
      gradBad.assign(Functions.mult(-learningRate * (1.0 +
regularization)));
      outputWeights[bad].assign(gradBad, Functions.PLUS);
      outputBias.setQuick(bad, outputBias.get(bad) - learningRate);

      // backpropagation from output to hidden layer for
      // computing the deltas (errors) of the hidden units
      Vector propHidden = outputWeights[good].clone();
      propHidden.assign(Function.NEGATE);
      propHidden.assign(outputWeights[bad], Functions.PLUS);
      // Gradient of sigmoid (logistic function) is s * (1 -s).
      Vector gradSig = hiddenActivation.clone();
      gradSig.assign(Functions.SIGMOIDGRADIENT);
      // Multiply by the change caused by the ranking loss.
      for (int i = 0; i < numHidden; i++) {
        gradSig.setQuick(i, gradSig.get(i) * propHidden.get(i));
      }

      // gradSig are now the deltas (errors) of the hidden layers
      // the weight change of w_ij should be proportional
      // to delta_i * x_j + regularization * w_ij
      for (int i = 0; i < numHidden; i++) {
        for (int j = 0; j < numFeatures; j++) {
          double v = hiddenWeights[i].get(j);
          v -= learningRate * (gradSig.get(i) + regularization * v);
          hiddenWeights[i].setQuick(j, v);
        }
      }
 
 
 

Prof. Dr. Christian Herta
HTW Berlin
Wilhelminenhofstraße 75A,
12459 Berlin, Gebäude C, Raum: 613
Email: christian.herta@htw-berlin.de
Telefon: (030) 5019-3498
Fax: (030) 5019-483498