This is a considerably lower-level post than usual, which I’ll (following Terry Tao) also blame on the holidays; there’s another, even less mathematical post in the works which I hope to finish sometime tomorrow.

How many times do you need to flip a coin before you expect to see both heads and tails? How many times do you need to roll a die before you expect to see all the numbers 1-6? These are two instances of the coupon collectors’ problem. Wikipedia gives not one, but two nice solutions to the problem, but there’s an even nicer “back-of-the-envelope” calculation which gives you the correct asymptotics for virtually nothing, and (I like to think) shows the power of thinking “categorically” at even a very low level.

So let’s give a statement of the problem. A company — say Coca-Cola, for concreteness — is holding a contest where everyone who collects one each of n different “coupons” wins some prize. You get a coupon with each purchase of a Coke, and each coupon is equally likely. What’s the expected number of Cokes you have to buy in order to collect all the coupons?

If you do some experimentation (or calculation) with small instances, you’ll see that this number seems to be growing somewhat faster than n. For n = 2, for example, the expected number is 3, and for n = 3 it’s 7. But how much faster? Like ? Or just a constant times n?

None of the above, as it happens, and you might have already guessed (or known) that the correct order of growth is . Here’s how you can figure this out for yourself.

Think of the collection as a function from Coke bottles to (equivalence classes of) coupons. If we’ve collected all the coupons, the function is surjective. So we can rephrase “What is the probability that, after I buy m Coke bottles, I have collected all n coupons” as “What is the probability that a random function from a set with m elements to a set with n elements is surjective?” Actually, we’ll estimate the probability that it’s *not* surjective.

If the function isn’t surjective, then its image contains at most n-1 elements. Fixing n-1 elements, the probability that our random function takes some element of the domain to this subset is . Since the appropriate events are all independent, the probability that the random function takes every element to the subset is therefore .

Now there are n possibilities for the subset of size n-1, so we apply the union bound and say that the probability that our random function is not surjective is at most . (Of course, this is an upper bound, and there is an error term; but we’ll return to that in a bit.)

So we want this expression to be smaller than, say, 1/10, which means that . But when n is large, we have that is about 1/e, so m has to be on the order of !

Now we’ll backtrack a bit. How do we know that the union bound was reasonably tight? After all, we counted functions whose image had size n-2 twice! Well, if you go back through the analysis and do inclusion-exclusion, you’ll see that the probability winds up being close to 1 when — but I don’t know of a computation-free way to argue that is asymptotically right! Does anyone else?

So how is this “categorical thinking?” Well, it’s not, really. Category theory only really starts to get mildly interesting when you talk about functors, and doesn’t come into its own right until natural transformations are introduced. But if you’ve learned to think categorically, you see morphisms where other people see objects — in this case, a function where others might see a set — and while this is rarely enough to apply abstract-nonsense tools, it is enough to broaden your intuition and see paths you might have otherwise missed. And this is at least as useful.