In a conventional memory system, such as the RAM or a harddisk of a computer, we measure the capacity of memory in units of bytes. The information stored is precise and also physically localized to a definite location in memory. Any errors that may occur destroy the integrity of the system.
In a neural network the information is stored in the synaptic strength matrix W. We could also ask the question of how much information can be stored in the network. By the conventional standards, since W is an N x N matrix, it needs order of N2 bytes to store W. Can we expect that we can roughly store this "bytes" of information in a neural network. The answer is roughly yes. We shall show that we can store about 0.1*N pictures (each picture containing order N bytes of information) in a Hopfield network.
First of all, the information stored in the Hopfield's model can be retrieved with some error. So the memory is error tolerant. Thus we need to say how much error is allowed when considering the memory capacity. In other words, given a certain level of error tolerance, how many pictures (patterns) can we store in a network. Detailed analysis of the problem is rather involved. Here we give a simple argument.
A pattern is determined by
sgn(Sj wij xj) = xi (for all i) (1)
where
wij = (1/N) Spu=1 xui xuj (2)
Let xi be the pattern solution associated with the exact stored input x1i. We can rewrite the equation as
sgn(hi) = xi
where
hi = Sj wij xj = (1/N) SjSpu=1 xui xujxj
We split the summation over u to a term for u = 1, and the rest. Then
hi = xi + (1/N) SjSpu =/= 1 xui xujxj
As long as the second term is smaller than the first term in magnitude, the equation sgn(hi) = xi is satisfied. We define
Ci = - (1/N) SjSpu =/= 1 xi xui xujxj (3)
so that hi = xi( 1 - Ci). The requirement that we recover the pattern is Ci < 1 for all i. We see that they depend on the set of patterns we wish to store. In order to have a quantitative result, we make some assumption about the patterns. We consider purely random pattern, with equal probability for xui= +1 and -1 independent of j and u. Thus the probability of a particular site i is in error is given by
Perror = Prob(Ci > 1).
We need to make an estimate of this probability. We view Eq.(3) as sum of random numbers of +1 or -1. There are N(p-1) terms of summation of +1 and -1. Clearly, the mean value of C is zero. The variance of C is variance of +1 and -1 (which is 1), times N*(p-1)/N2 and is approximately p/N. Here we have used a theorem in probability about variance
var( a1 x1 + a2 x2 + ...) = a21 var(x1) + a22 var(x2) + ...
Next, we evoke the law of large numbers in probability and approximate the distribution of Ci with a Gaussian distribution with mean zero and variance p/N.
P(Ci) approximately equals to 1/ ( sqrt(2 p) s) exp( - x2/(2s2)).
Thus the probability that Ci is greater than 1 is given by
Perror = integral(from 1 to infinity) 1/ ( sqrt(2 p) s) exp( - x2/(2s2)) dx
=[ 1- erf(sqrt(N/(2p)) )]/2
Here erf( ) is the error function.
Some of the values
Perror Pmax/N
0.001 0.105
0.0036 0.138
0.01 0.185
0.05 0.37
0.1 0.61