Memory Capacity

Introduction

In a conventional memory system, such as the RAM or a harddisk of a computer, we measure the capacity of memory in units of bytes. The information stored is precise and also physically localized to a definite location in memory. Any errors that may occur destroy the integrity of the system.

In a neural network the information is stored in the synaptic strength matrix W. We could also ask the question of how much information can be stored in the network. By the conventional standards, since W is an N x N matrix, it needs order of N² bytes to store W. Can we expect that we can roughly store this "bytes" of information in a neural network. The answer is roughly yes. We shall show that we can store about 0.1*N pictures (each picture containing order N bytes of information) in a Hopfield network.

Memory Capacity of Hopfield's Model

First of all, the information stored in the Hopfield's model can be retrieved with some error. So the memory is error tolerant. Thus we need to say how much error is allowed when considering the memory capacity. In other words, given a certain level of error tolerance, how many pictures (patterns) can we store in a network. Detailed analysis of the problem is rather involved. Here we give a simple argument.

A pattern is determined by

sgn(S_j w_ij x_j) = x_i (for all i) (1)

where

w_ij = (1/N) S^p_u=1 x^u_i x^u_j (2)

Let x_i be the pattern solution associated with the exact stored input x¹_i. We can rewrite the equation as

sgn(h_i) = x_i

where

h_i = S_j w_ij x_j = (1/N) S_jS^p_u=1 x^u_i x^u_jx_j

We split the summation over u to a term for u = 1, and the rest. Then

h_i = x_i + (1/N) S_jS^p_{u
=/= 1} x^u_i x^u_jx_j

As long as the second term is smaller than the first term in magnitude, the equation sgn(h_i) = x_i is satisfied. We define

C_i = - (1/N) S_jS^p_{u
=/= 1} x_i x^u_i x^u_jx_j (3)

so that h_i = x_i( 1 - C_i). The requirement that we recover the pattern is C_i < 1 for all i. We see that they depend on the set of patterns we wish to store. In order to have a quantitative result, we make some assumption about the patterns. We consider purely random pattern, with equal probability for x^u_i= +1 and -1 independent of j and u. Thus the probability of a particular site i is in error is given by

P_error= Prob(C_i > 1).

We need to make an estimate of this probability. We view Eq.(3) as sum of random numbers of +1 or -1. There are N(p-1) terms of summation of +1 and -1. Clearly, the mean value of C is zero. The variance of C is variance of +1 and -1 (which is 1), times N*(p-1)/N² and is approximately p/N. Here we have used a theorem in probability about variance

var( a₁ x₁ + a₂ x₂ + ...) = a²₁ var(x₁) + a²₂ var(x₂) + ...

Next, we evoke the law of large numbers in probability and approximate the distribution of C_i with a Gaussian distribution with mean zero and variance p/N.

P(C_i) approximately equals to 1/ ( sqrt(2 p) s) exp( - x²/(2s²)).

Thus the probability that C_i is greater than 1 is given by

Perror = integral(from 1 to infinity) 1/ ( sqrt(2 p) s) exp( - x²/(2s²)) dx

=[ 1- erf(sqrt(N/(2p)) )]/2

Here erf( ) is the error function.

Some of the values

Perror Pmax/N

0.001 0.105

0.0036 0.138

0.01 0.185

0.05 0.37

0.1 0.61