In a conventional memory system, such as the RAM or a harddisk of a computer, we measure the capacity of memory in units of bytes. The information stored is precise and also physically localized to a definite location in memory. Any errors that may occur destroy the integrity of the system.

In a neural network the information is stored in the synaptic
strength matrix W. We could also ask the question of how much
information can be stored in the network. By the conventional standards,
since W is an N x N matrix, it needs order of N^{2} bytes to store
W. Can we expect that we can roughly store this "bytes" of
information in a neural network. The answer is roughly yes. We shall
show that we can store about 0.1*N pictures (each picture containing order N
bytes of information) in a Hopfield network.

First of all, the information stored in the Hopfield's model can be retrieved with some error. So the memory is error tolerant. Thus we need to say how much error is allowed when considering the memory capacity. In other words, given a certain level of error tolerance, how many pictures (patterns) can we store in a network. Detailed analysis of the problem is rather involved. Here we give a simple argument.

A pattern is determined by

sgn(S_{j}
w_{ij} x_{j}) = x_{i}
(for all i)
(1)

where

w_{ij} =
(1/N) S^{p}_{u=1} x^{u}_{i}
x^{u}_{j }
(2)

Let x_{i} be the pattern
solution associated with the exact stored input x^{1}_{i}.
We can rewrite the equation as

sgn(h_{i})
= x_{i}

where

h_{i} =
S_{j}
w_{ij} x_{j} =
(1/N) S_{j}S^{p}_{u=1} x^{u}_{i}
x^{u}_{j}x_{j}

We split the summation over u to a term for u = 1, and the rest. Then

h_{i} =
x_{i} + (1/N) S_{j}S^{p}_{u
=/= 1} x^{u}_{i}
x^{u}_{j}x_{j}

As long as the second term is smaller than the first term in
magnitude, the equation sgn(h_{i}) = x_{i}
is satisfied. We define

C_{i} = - (1/N)
S_{j}S^{p}_{u
=/= 1} x_{i} x^{u}_{i}
x^{u}_{j}x_{j }
(3)

so that h_{i} = x_{i}(
1 - C_{i}). The requirement that we recover the pattern is C_{i}
< 1 for all i. We see that they depend on the set of patterns we wish to
store. In order to have a quantitative result, we make some assumption
about the patterns. We consider purely random pattern, with equal
probability for x^{u}_{i}= +1 and -1
independent of j and u. Thus the probability of a particular site i is in
error is given by

P_{error }=
Prob(C_{i} > 1).

We need to make an estimate of this probability. We
view Eq.(3) as sum of random numbers of +1 or -1. There are N(p-1) terms
of summation of +1 and -1. Clearly, the mean value of C is
zero. The variance of C is variance of +1 and -1 (which is 1), times
N*(p-1)/N^{2 } and is approximately p/N. Here we have used a
theorem in probability about variance

var( a_{1}
x_{1} + a_{2} x_{2} + ...) = a^{2}_{1}
var(x_{1}) + a^{2}_{2} var(x_{2}) + ...

Next, we evoke the law of large numbers in probability and
approximate the distribution of C_{i} with a Gaussian distribution with mean zero
and variance p/N.

P(C_{i})
approximately equals to 1/ ( sqrt(2 p) s)
exp( - x^{2}/(2s^{2})).

Thus the probability that C_{i} is greater than 1 is
given by

Perror = integral(from 1
to infinity) 1/ ( sqrt(2 p) s)
exp( - x^{2}/(2s^{2}))
dx

=[ 1- erf(sqrt(N/(2p)) )]/2

Here erf( ) is the error function.

Some of the values

Perror Pmax/N

0.001 0.105

0.0036 0.138

0.01 0.185

0.05 0.37

0.1 0.61