Perceptron Training Rule

Weights adjusted according to:

\begin{displaymath}
W_j \leftarrow W_j + \alpha \times I_j Err \times g'(in)\end{displaymath}

(which differs from what is in the text--formula in text is a workable variant and, in fact, is what you would get in the following derivation if the activation function were the identity function)

If the activation function is a continuous, differentiable function (such as the sigmoid), the training rule is essentially implementing gradient descent on the squared error.


next up previous
Next: Gradient Descent on Squared Up: NEURAL NETWORKS Previous: Gradient Descent Idea