The importance of choosing the right model

I’ve had the idea for this post bouncing around in my head for several months now, but now that I don’t have classes to worry about I can finally get around to writing it. I want to talk about some “pathological examples” in computer science, in particular in complexity and computability theory.

I wanted to post a picture of William H. Mills with this post, in the spirit of Dick Lipton’s blog. Unfortunately I can’t find one! Mills was a student of Emil Artin in the ’40s; he finished his thesis in 1949 and promptly (as far as I can tell) disappeared until the ’70s, when we find some work by a William H. Mills on combinatorial designs. After this, there’s nothing of note until Dr. Mills passed away several years ago at the age of 85.

While his work (assuming it’s his) on combinatorial designs looks interesting, W.H. Mills’ small place in history is assured by a short and unassuming paper from 1947, published when he was still a student. In this paper he showed that there’s a constant, which he called A, such that [A^{3^n}] is prime for all integers n \geq 1. Nowadays it’s called Mills’ constant.

This is a formula for prime numbers — something that hundreds of amateur and professional mathematicians have sought in vain for centuries! So why isn’t Mills more famous? Well, partially because this isn’t actually a very deep result. It’s difficult, yeah, but it follows directly from earlier work that showed that there’s always a prime between n^3 and (n+1)^3. So we can construct a sequence of primes, which starts off 2, 11, 1361, …, by taking a_n to be the smallest prime greater than a_{n-1}^3, and just take A to be the limit of a_n^{1/3^n}, and everything works out nicely. So the formula doesn’t tell us anything special about the primes, or even use anything that special — you could do the same thing for any dense enough set of integers.

The other reason why Mills’ formula isn’t as exciting as it seems is the main topic of this post, and I’ll come back to it in a bit. But first, let’s switch gears and talk graph theory.

So the first thing we need is the notion of a graph minor. Briefly, and hand-wavingly, we say that H is a minor of G if we can split some of the vertices of H “in half,” getting a pair of adjacent vertices with the same neighbors as the original vertex, and obtain a subgraph of G. This seems like a rather arbitrary notion, until you see that the “vertex-splitting” doesn’t affect the topological properties of the graph, like the ability to draw it in the plane. And in fact there’s a classic result which characterizes the planar graphs as ones with no minor isomorphic to either K_5, the complete graph on 5 vertices, or K_{3,3}, a complete bipartite graph.

From there, you might naturally ask if the graphs that can be drawn on a torus, or any surface, can be characterized by a set of “forbidden minors.” To make a very long story very short, the answer turns out to be “yes.” This was proved by Robertson and Seymour in a massive proof spanning twenty papers and two and a half decades. Obviously even sketching the proof is beyond the scope of this blog (or my personal ability), so I’m just going to focus on one consequence of it.

In one of their many papers, Robertson and Seymour described a polynomial-time algorithm for testing whether a fixed graph H is a minor of G. The running time, to be clear, is polynomial (cubic, to be precise) in the size of G; there’s a hidden constant there that depends on H. Since the question of whether G can be drawn on a surface reduces to testing whether it has any of a number of minors, there is therefore a cubic time algorithm to test whether G can be drawn on a fixed surface.

The thing is, the implied constant is HUGE!

To illustrate, the set of forbidden minors for just the torus numbers somewhere in the thousands. So that’s thousands of O(n^3) algorithms we have to run, many of which themselves hide really really big constants in the big-oh notation. And this is (almost) the simplest non-planar case! In fact, we have reason to believe that the implied constant might explode at Ackermann-type rates, which makes the algorithm pretty much useless in practice. And computing the simplest surface on which a graph can be drawn is NP-complete, so unless P = NP the implied constant certainly grows superpolynomially.

So does this mean that the “polynomial time = efficient” paradigm is wrong or useless? Of course not. The vast majority of polynomial-time algorithms we use in practice have small exponents and small implied constants and are, in fact, reasonably efficient; the vast majority of exponential-time algorithms quickly become hopelessly slow. But situations like this one are a very good reminder to think about the limitations of your model.

Even more interesting is the following statement: The existence of this particular family of polynomial-time algorithms is NOT a theorem of Peano arithmetic. In fact, while you don’t need the full power of ZFC to prove Robertson and Seymour’s theorem, you need a rather large fragment of it. So we have an unusual situation where there exists a polynomial-time algorithm, but we can’t prove it! (Actually, there may be a different polynomial-time algorithm whose existence is provable in Peano, but as far as I know no one has a clue what it would look like.) What would be really fascinating would be a situation like the following: There is some natural problem which admits a (family of) polynomial-time algorithms, but the existence and/or correctness of the algorithm isn’t even provable in ZFC.

(Incidentally, one can describe an algorithm which solves 3SAT and runs in polynomial time iff P = NP. It’s not that difficult to come up with, so I’ll leave it as an exercise. But it’s possible that P = NP is true but not provable in ZFC, in which case that algorithm would have the property described.)

Anyway, as promised, let’s get back to Mills’ constant. We proved the constant’s existence by constructing the sequence of prime numbers that it gives us. But this is the only way we know of to compute it — so if we want to use Mills’ constant to compute a sequence of primes, we have to know the sequence in advance!

But maybe there is some other way to compute Mills’ constant. For all we know, it could even be rational, although it’s very probably not. And if we could compute A, we could compute some large primes. In fact, we might even be able to compute large prime numbers in polynomial time, if we could get the bits of A fast enough.

One way of doing this is to introduce an oracle. An oracle is just a black-box, that you can feed some input into and get a bit (or any number of bits, it doesn’t much matter) out in constant time. Of course these things don’t exist in the real world, but they’re very useful in theory! It’s often possible to show two contradictory results in the presence of different oracles (for instance, relative to one oracle, P = NP, but relative to a different one P \neq NP), and then no “proof” of P \neq NP that would work in the presence of every oracle — we say such a proof “relativizes” — can be correct.

The thing about oracles is that you can do all sorts of crazy things with them — they aren’t bound by the usual laws of computation. For instance, you can assume the existence of an oracle that solves the halting problem! An oracle for Mills’ constant wouldn’t do anything that crazy, since it’s computable, but it would give us a deterministic polynomial-time algorithm to compute large primes, which isn’t currently known to exist. (There are plenty of reasons to believe that such algorithms do exist, though — we just can’t prove it yet.)

If you don’t like oracles, though, you can turn to “real computation,” which is pretty much exactly what it sounds like. Instead of working with bits, real computation lets us work with infinite-precision real numbers! So in this case we don’t have to worry about silly numerical issues — we just exponentiate.

But there’s a subtlety here, too. If you’re given an infinite-precision real number, you can read off its binary (or ternary, or decimal…) expansion, by multiplying by the appropriate power of 2. And if you have an oracle that takes in positive integers as input, you can pack all that information into a real number. So real numbers can function as oracles! Actually, there’s a famous number called Chaitin’s constant which in a sense “encodes the halting problem.” But there are problems that even Turing machines with a halting oracle can’t solve. But there’s an oracle that can solve some of those! And there’s an oracle that can solve problems too hard for Turing machines with that second oracle! And so on, and so forth. (That said, real computation can’t solve every problem, since the Cantor-Godel-Turing diagonalization argument still works. There are bizarre ways to get around this, but discussing these would take us far afield.)

So, what’s the point of all this? Just that computational models are tricky beasts — oftentimes what seems intuitive can breed strange behavior and small changes can wreak havoc with computational power.

There’s one last thing I can’t help mentioning, though, mostly because it’s so cool. There are a fair number of (what we believe to be) “universal constants,” independent of any units we might use to measure them. Stuff like the fine structure constant, or the ratio of the mass of a proton to the mass of an electron. What’s fascinating is that there’s no real a priori reason why these constants should be computable numbers — and, in fact, there’s nothing that prevents the fine structure constant from being an oracle for the halting problem!

So, is there any chance we can use the fundamental laws of physics to beat Turing machines? Unfortunately, no. These fundamental constants are, in a sense, private variables; the same laws that rely on them prevent us from being able to measure them to arbitrary precision, thanks to stuff like the uncertainty principle, the observer effect, and especially the Bekenstein bound. It’s almost as if the universe is conspiring to make the Church-Turing thesis true — other people, like the lamentedly non-blogging Scott Aaronson, would go even further and say that the universe conspires to make solving NP-complete problems intractable. (Well, maybe with less malicious intent on the universe’s part.) Of course, I don’t recommend you read too much into these kinds of ideas — that way lies crankiness — but it’s a fascinating thing to sit and ponder about every once in a while.

Tags: , , , , , ,

3 Responses to “The importance of choosing the right model”

  1. Qiaochu Yuan Says:

    I don’t see how a statement like “the fine structure constant is not a computable real” has any content if it can’t be measured to arbitrary precision anyway.

    • Harrison Says:

      Well, it doesn’t end up having any content, I agree. (Although I imagine you could argue that there is some “Platonic value” for the fine structure constant…) My point was, it’s conceivable that the universe could allow us to measure such things to arbitrary precision, and then such statements would start to be meaningful. I don’t think I made that as clear as I intended to, though.

  2. colinwytan Says:

    Harrison, where did you read about real computation? Is there published work on computability theory with real numbers instead of natural numbers? It seems that polymath2 would need some real computation.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s


Follow

Get every new post delivered to your Inbox.

Join 26 other followers

%d bloggers like this: