next up previous
Next: 2.4.5 Local Minimum Problem Up: 2.4 Backpropagation Neural Networks Previous: 2.4.3 Backpropagation Processing Unit

2.4.4 Backpropagation Learning Algorithm

The backpropagation algorithm trains a given feed-forward multilayer neural network for a given set of input patterns with known classifications. When each entry of the sample set is presented to the network, the network examines its output response to the sample input pattern. The output response is then compared to the known and desired output and the error value is calculated. Based on the error, the connection weights are adjusted. The backpropagation algorithm is based on Widrow-Hoff delta learning rule in which the weight adjustment is done through mean square error of the output response to the sample input [Vel98]. The set of these sample patterns are repeatedly presented to the network until the error value is minimized.

Refer to the figure 2.12 that illustrates the backpropagation multilayer network with $ M$ layers. $ N_j$ represents the number of neurons in $ j$th layer. Here, the network is presented the $ p$th pattern of training sample set with $ N_0$-dimensional input $ X_{p1}, X_{p2},
... , X_{pN_0}$ and $ N_M$-dimensional known output response $ T_{p1}, T_{p2},
... , T_{pN_M}$. The actual response to the input pattern by the network is represented as $ O_{p1}, O_{p2},$ $ ... , O_{pN_M}$. Let $ Y_{ji}$ be the output from the $ i$th neuron in layer $ j$ for $ p$th pattern; $ W_{jik}$ be the connection weight from $ k$th neuron in layer $ (j-1)$ to $ i$th neuron in layer $ j$; and $ \delta_{ji}$ be the error value associated with the $ i$th neuron in layer $ j$.

Figure 2.12: Backpropagation Neural Network
\begin{figure}
\centerline {\epsfysize=4.0in \epsfbox{./figures/figBackprop.epsi}}\end{figure}

The following is the outline of the backpropagation learning algorithm [BJ91]:

  1. Initialize connection weights into small random values.
  2. Present the $ p$th sample input vector of pattern $ {\bf X_p} = (X_{p1},
X_{p2}, ... , X_{pN_0})$ and the corresponding output target $ {\bf T_p}
= (T_{p1}, T_{p2}, ... , T_{pN_M})$ to the network.
  3. Pass the input values to the first layer, layer 1. For every input node $ i$ in layer 0, perform:

    $\displaystyle Y_{0i} = X_{pi}.
$

  4. For every neuron $ i$ in every layer $ j = 1, 2, ..., M$, from input to output layer, find the output from the neuron:

    $\displaystyle Y_{ji} = f\left(\mathop{\sum_{k = 1}^{N_{j-1}}}{Y_{(j-1)k} W_{jik}}\right),
$

    where

    $\displaystyle f(x) = \frac{1}{1 + exp(-x)}.
$

  5. Obtain output values. For every output node $ i$ in layer $ M$, perform:

    $\displaystyle O_{pi} = Y_{Mi}.
$

  6. Calculate error value $ \delta_{ji}$ for every neuron $ i$ in every layer in backward order $ j = M, M-1, ... , 2, 1$, from output to input layer, followed by weight adjustments. For the output layer, the error value is:

    $\displaystyle \delta_{Mi} = Y_{Mi}(1 - Y_{Mi})(T_{pi} - Y_{Mi}),$ (2.10)

    and for hidden layers:

    $\displaystyle \delta_{ji} = Y_{ji}(1 - Y_{ji}) \mathop{\sum_{k=1}^{N_{j+1}}}{\delta_{(j+1)k} W_{(j+1)ki}}.$ (2.11)

    The weight adjustment can be done for every connection from neuron $ k$ in layer $ (i-1)$ to every neuron $ i$ in every layer $ i$:

    $\displaystyle W_{jik}^+ = W_{jik} + \beta \delta_{ji} Y_{ji},$ (2.12)

    where $ \beta$ represents weight adjustment factor normalized between 0 and 1. The derivation of the equations above will be discussed soon.
The actions in steps 2 through 6 will be repeated for every training sample pattern $ p$, and repeated for these sets until the root mean square (RMS) of output errors is minimized.

We now attempt to derive the error and weight adjustment equations shown above. Let's begin with the Root Mean Square (RMS) of the errors in the output layer defined as:

$\displaystyle E_p = \frac{1}{2} \mathop{\sum_{j=1}^{N_M}}{(T_{pj} - O_{pj})^2},$ (2.13)

for the $ p$th sample pattern. In generalized delta rule [BJ91,Day90,Gur97], the error value $ \delta_{ji}$ associated with the $ i$th neuron in layer $ j$ is the rate of change in the RMS error $ E_p$ respect to the sum-of-product of the neuron:

$\displaystyle \delta_{ji} = - \frac{\partial E_p}{\partial net_{ji}},$ (2.14)

where $ net_{ji}$ represents the sum-of-product value. With the chain rule, we can obtain the rate of change in the RMS error $ E_p$ in response to weight change:

\begin{displaymath}\begin{align}\frac{\partial E_p}{\partial W_{jik}} &= \frac{\...
...(j-1)k} W_{(j-1)ik} \\  &= -\delta_{ji} Y_{(j-1)k}. \end{align}\end{displaymath}

We can say that the weight change is proportional to this value above [BJ91].

$\displaystyle \Delta W_{jik} = \beta \delta_{ji} Y_{(j-1)k},$ (2.15)

where $ \beta$ is a constant.

Thus, weight change can be performed as:

$\displaystyle W_{jik}^+ = W_{jik} + \Delta W_{jik},$ (2.16)

which should match equation (2.12).

Now let's get back to the equation (2.14) to find an error value associate with the neuron. Again, using the chain rule, we get:

$\displaystyle \delta_{ji} = -\frac{\partial E_p}{\partial Y_{ji}} \frac{\partial Y_{ji}}{\partial net_{ji}}.$ (2.17)

For output layer, $ j = M$ and $ Y_{Mi} = O_{pi}$. Thus,

\begin{displaymath}\begin{align}\delta_{Mi} &= -\frac{\partial E_p}{\partial O_{...
...rac{1}{2} (T_{pi} - O_{pi})^2 \right] f'(net_{Mi}). \end{align}\end{displaymath}

Using equation (2.9),

\begin{displaymath}\begin{align}\delta_{Mi} &= (T_{pi} - O_{pi}) \left[f(net_{Mi...
...ht] \\  &= (T_{pi} - O_{pi}) (O_{pi}) (1 - O_{pi}). \end{align}\end{displaymath}

This should correspond with equation (2.10). For error values associated with the hidden layer neurons, we cannot use target values. For this reason, the part $ \partial E_p$ / $ \partial Y_{ji}$ in equation (2.21) needs to be found using a different approach. We use the chain rule applied to the sum-of-product values of neurons in the front layer (layer $ (j+1)$).

\begin{displaymath}\begin{align}\frac{\partial E_p}{\partial Y_{ji}} &= \frac{\p...
...+1}}}{\left[ -\delta_{(j+1)a} W_{(j+1)ai} \right]}. \end{align}\end{displaymath}

Finally, combined with $ \partial Y_{ji}$ / $ \partial net_{ji}$ we get:

\begin{displaymath}\begin{align}\delta_{ji} &= -\mathop{\sum_{a=1}^{N_{j+1}}}{\l...
...{j+1}}}{\left[ \delta_{(j+1)a} W_{(j+1)ai} \right]} \end{align}\end{displaymath}

This should concur with equation (2.11).


next up previous
Next: 2.4.5 Local Minimum Problem Up: 2.4 Backpropagation Neural Networks Previous: 2.4.3 Backpropagation Processing Unit
Kiyoshi Kawaguchi
2000-06-17