Robert Morris's algorithm for counting large numbers using 8-bit registers is an early example of a sketch or data structure for efficiently processing a data stream. I introduce the algorithm and analyze its probabilistic behavior.
Published
11 November 2019
An approximate counting algorithm is a procedure for counting a large number of events using a small amount of memory. Approximate counting was invented by Robert Morris while he was working in Bell Labs (Morris, 1978). His motivation was that he wanted to count a large number of events in an 8-bit register. In general, an n-bit register can count 2n−1 values, and so an 8-bit register can only count 255 events. Since Morris was only interested in approximate counts, he developed a probabilistic algorithm with a quantifiable margin of error. This algorithm is an early example of a sketching algorithm, or an algorithm which compresses a streaming data set for efficient querying.
Before we work through Morris’s algorithm, let’s follow his reasoning in the paper. A simple approach would be to count every other event with probability p. Let Xn be a random variable for the approximate count value by event n; this is the value we store in our counter, not the real number of events. Then clearly
Xnμσ2∼Binomial(n,p)=E[Xn]=np=V[Xn]=np(1−p).
Morris’s example is that for p=1/2, after 400 events, μ=200 and σ=10, and our estimate of the true number of events is 2Xn. So 95% of the time, our estimate is within 2(2σ)=40 of the actual error count or 10% error.
The problem with this approach is that it can have high relative error for small counts, where
Relative error=∣∣∣∣1−μμapprox.∣∣∣∣.
For example, the probability of three events having 100% relative error is just p3=1/8. The goal, then, is to develop an approximate counting algorithm whose relative error is nearly independent of n. This leads to Morris’s algorithm.
Morris’s algorithm
Imagine that the value Xn stored in a counter is
Xn=log2(1+n)
where n is the number of events that have occurred so far. The plus one just ensures that n=0⟹Xn=0. At any given time, our best estimate of n is
θ^n=2Xn−1.
To increment the counter, we compute
Xn+1=log2(1+2Xn).
That is, our new estimate is θ^_n+1=θ^_n+1=2Xn. We offset by this estimate by one and take the log. However, this value isn’t an integer in general. If we round the value, we might accumulate errors in our approximation. Instead, what we can do is the following. First, initialize X0=0. Next, for n>0, compute
p=2X1,r∼Uniform(0,1).
Then the value of Xn+1 is
Xn+1={Xn+1Xnif p<relse.
Why is this a reasonable estimate? Because the expected value of 2Xn−1 is n. We can prove this by induction. I follow Jelani Nelson’s lecture notes on sketching algorithms. The base case is when n=0. Clearly, E[2X0−1]=0 by our initialization criteria. Now assume that
Thus, to bound the variance of our estimator of n, we just need to compute E[22Xn]. First, note that we can express the probability that Xn takes value i as
(2i−11)P(Xn−1=i−1)+(1−2i1)P(Xn−1=i).
In words, this is just the probability that Xn−1 was i−1 and was incremented, plus the probability that Xn−1 was i but was not incremented. Then we can write the expectation above as
The sum is just a finite arithmetic sequence, and it’s easy to check that ∑i=1n3i=23n(n+1). Thus, we have proven our bound,
V[θ^n]≤23n(n+1)+1.
While this is not the best bound on the variance—the variance of our estimator increases as a function of n—we can use it to bound the probability that 2Xn−1 deviates from its expectated value (Figure 1). I believe this is what Morris meant when he said that this algorithm bounds relative error.
Figure 1. Morris's algorithm on n=1024 data points. The dotted line is the true counts, while the black line is Morris's estimate 2Xn−1.
Bounded tail probabilities
To bound the tail probabilities, we invoke Markov’s inequality,
P[∣θ^n−n∣≥εn]≤ε2n2V[θ^n]≤2ε21.
The second inequality follows from the upper bound we just computed on the variance, namely
In words, the probability that our estimate of n deviates far from the true value of n is bounded. Of course, the tail probability is over εn. We can see in Figure 1 that as n increases, the variance or deviation from the true value also increases.
Space complexity
The number of bits required to store Xn is log2(Xn). However, Morris’s algorithm only needs to increment Xn an average of log(2n+1) times. Thus,
log2(Xn)≤log2log2(2n+1)=O(loglogn)
on average.
Morris, R. (1978). Counting large numbers of events in small registers. Communications of the ACM, 21(10), 840–842.
Flajolet, P. (1985). Approximate counting: a detailed analysis. BIT Numerical Mathematics, 25(1), 113–134.