Boltzmann Machine

Introduction

Unlike the Hopfield model, Boltzmann machine has a learning process. Just like the neural models discussed in Chapters 1 to 6, Boltzmann machine operates in two phases, a training phase to learn the external world, and a generalization phase to give results. Unlike the neural networks of earlier Chapters, Boltzmann machine is a stochastic machine, very much like the Hopfield model at a temperature T. The name is derived from the Boltzmman distribution of the neural states of equilibrium statistical machine. The Boltzmann machine can be used to model the distribution of inputs. The distributions are obtained through a learning process.

The Boltzmann Machine

Consider a neural network system of Hopfield-type with the Ising spin value S_i = +1 or -1 at each node (neuron). Let the synaptic strength be w_ij and is symmetric so that a total energy can be meaningfully defined,

E(S) = - (1/2) S_ij w_ij S_i S_j.

In a free running phase (the second phase), the system runs one of the version of finite temperature dynamics, such as a heat-bath algorithm to realize a Boltzmann distribution.

P(S) proportional to exp( - E(S)/kT).

The neurons are divided into two groups, the external (visible) neurons S^v and internal (hidden neurons) Sⁱ. In the learning process, the external neurons take the values of inputs. So they act as external fields to the system and do not change. The internal neurons run the usual heat-bath algorithm as usual. The purpose of the second phase, of course, is to learn to adjust the weight, by the following rule:

w_ij <- w_ij + (h/T) ( <S_i S_j>_clamped - <S_i S_j>_free),

Where the calculation of equilibrium average values <...> is a time-consuming process. The clamped averages are the average obtained in the second phase where the external or the visible spin values are fixed by the input. The second average is the same quantity at the free run samples, obtained during the first phase. See the textbook (Haykin) for the reason and derivation of this updating rule.

We note that the input samples are supposed to be drawn from some unknown distribution. For each sample, the above updating has to been down. Each updating requires long simulation to compute the averages. Thus Boltzmann machine learning is very slow and extremely CPU time intensive. New models are developed to have the feature of Boltzmann machine, but much less expensive in computation.