GILA 2: The probabilistic approach

To begin, a historical curiosity. In the last post, I talked about Khinchin’s little book on number theory, which from today’s perspective is one of the earliest books to deal entirely with additive combinatorics and additive number theory. One of Khinchin’s most famous results has to do with the denominators of continued fractions; Khinchin’s original proof of the theorem was quite complicated and involved. Nowadays, the existence of Khinchin’s constant is understood as a simple consequence of the ergodicity of the Gauss-Kuzmin-Wirsing operator.

Where else do we see ergodic theory popping up to simplify complicated proofs? Why, in additive combinatorics! Although we won’t discuss it in this series, ergodic theory has become a central tool for Szemeredi-type theorems, and some results are still only provable by such methods (for instance, until recently, the density Hales-Jewett theorem was in this class.)

To top it all off, note that ergodic theory was originally motivated by problems in statistical physics. Guess who wrote one of the best-known textbooks on mathematical statistical physics? But on to the math!

We’ve seen that traditional arguments based on the pigeonhole principle and such aren’t sufficient to prove Roth’s theorem. So we’ll go to the next tool in our kit — the probabilistic method.

The probabilistic method is one of the most useful weapons in a combinatorialist’s arsenal, even having a whole textbook devoted to it. (The linked page is out of date; a third edition was published last year.) The core of the method is this — if we want to show the existence of an object with a given property, instead of constructing one directly, we instead show that a randomly chosen large enough object has the property with positive probability. Its best-known use in Ramsey theory is for proving lower bounds, where it often gives the best known results, but clever applications can give upper bounds as well, for instance in the proof of Kraft’s inequality. Of particular interest to us is the fact that it provides our first nontrivial proof in the direction of Roth’s theorem.

Recall the statement of (the finitary version of) Roth’s theorem:

Let $\delta > 0$. Then there exists some integer $N$ such that any subset of $\{1, 2, \ldots, n\}$ with $n > N$ and at least $\delta n$ elements contains a three-term arithmetic progression.

We’re going to use the union bound to avoid the fact that our subset can be very pseudo-random, very non-random, or somewhere in between, which was on some level preventing us from using pigeonhole-style arguments. The catch? We’ll only be able to prove Roth’s theorem for sufficiently high densities.

Theorem. Let $\delta > 2/3$. Then there exists some integer $N$ such that any subset of $\{1, 2, \ldots, n\}$ with $n > N$ and at least $\delta n$ elements contains a three-term arithmetic progression. (In fact, we can take N = 3.)

Proof. Pick a random three-term arithmetic progression from $\{1, 2, \ldots, n\}$. Then the probability that the first term is not in the subset is $1-\delta < 1/3$. Similarly, the probability that the second and third terms are not in the subset are each at most $1/3$. So the probability that at least one of them is not in the subset is strictly less than $1/3+1/3+1/3 = 1$; in particular, there is a nonzero probability that the arithmetic progression is contained in the subset.

(Aside: In the infinitary version, using Schnirelmann density, this result is trivial, since any set of Schnirelmann density greater than $2/3$ will necessarily contain the AP $1, 2, 3$. The advantage of this approach is that it extends to the finitary version.)

Unfortunately, it’s pretty much immediately clear that the union bound won’t let us do better than the above. So we move to my personal Favorite Tool Ever, the fantastic property of the universe known as linearity of expectation.

I could probably spend my life cataloguing all the myriad ways linearity of expectation is useful in combinatorics and theoretical CS. The wonderful thing about it is that it applies to any collection of random variables, no matter how complicated the web of dependencies between them might be, and it (unlike, for instance, the union bound) is an equality, so we can often get very tight results. There could likely be books written on linearity of expectation, but my current favorite unexpected application of it comes to us via Gil Kalai.

But enough praise-singing. Can we get better partial Roth’s theorems using linearity of expectation? Well, note that we can immediately rephrase our above proof in terms of linearity of expectation, which also tells us why we can’t do better with the union bound alone.

We’d like to show that the expected number of members of our subset in a random 3-term AP is strictly greater than 2. Unfortunately, here we see that linearity of expectation’s greatest strength can also be its greatest failing; to do better than $3\delta$ for this value, we’d need to sample from some weird distribution where the density is higher, because we can’t take advantage of any relationships between the members of the AP.

But linearity of expectation isn’t the end of the line for the probabilistic method. There are more powerful tools still —  maybe we could take advantage of them? You could spend weeks trying, and you’d probably have fun, but I’ll just go ahead and spoil the punchline: No. At least not without an additional insight.

The reason is essentially that we can do better — significantly better — than the probabilistic method at finding a set of integers with no 3-term AP. The usual construction is due to Behrend (until quite recently, it was the best-known construction!) and relies on the simple fact that a sphere is convex and therefore has no solutions to $u+v = 2w$ (with $u, v, w$ vectors.) Behrend cleverly finds a way to transplant this insight to the integers, and gets a subset of $\{1, 2, \ldots, n\}$ of density $e^{-c\sqrt{log n}}$ without any 3-term AP.

Behrend’s counterexample, to quote Ben Green and Terry Tao’s Scholarpedia article, “strongly suggests that the proof of [Roth’s conjecture] must be somehow ‘iterative’ in nature.” So now what? Well, let’s take a couple of steps back.

Earlier we said that, in order to get the linearity of expectation/union bound proof working, we’d have to sample from a distribution that had “higher density.” What might such a distribution look like? Well, to start with, we’d want it to be supported on a long arithmetic progression, since otherwise we’d have to correct for the fact that our APs might not fall into it. Furthermore, to simplify the calculations, we’ll suppose for now that it’s uniformly distributed over its support.

Okay, so let’s try passing to a random AP. We’re going to cheat slightly and think about the infinitary version, where we can get a slightly better intuition, even though there are some technical difficulties involved in “picking a random integer.” So suppose that our subset of the integers has density $\delta$. We want to find the density of a subset on an AP of the form $r, r+d, r+2d, \ldots$, with $r < d$. Somewhat encouragingly, if we fix $d$ and let $r$ vary, we see that the average density of our subset on these APs is exactly $\delta$ — so either our subset is uniformly distributed $(mod~d)$ or we can pass to an arithmetic progression with higher density.

Unfortunately, there are situations where the subset is uniformly distributed $(mod~d)$. I’ll give two examples. First, suppose our subset is chosen “randomly.” Then it’s easy to see that it’s uniformly distributed $(mod~d)$ for every $d$. Of course, we have a heuristic argument that says that these should have arithmetic progressions anyway, so that’s not too much of a problem.

Here’s the other example, which showcases the structure/randomness dichotomy once more. Suppose our subset is, for instance, $\{7, 14, 21, 28, \ldots\}$. Then if $d$ is relatively prime to 7, we can’t pass to a higher-density AP with difference $d$ — on the other hand, if d = 7, we’re done! So in general, we need a tool that lets us pick the right common difference to pass to the AP; we can’t quite do it probabilistically.

We still don’t have a proof, but we’re farther along than we’ve yet been. I’d like to sketch an argument which we think, at this point, might give us the desired result.

Sketch. First, we need to make rigorous our intuition that random enough sets of positive density will have length-3 APs. The obvious way to do this is by showing the contrapositive. We want to find some suitable “randomness measure” such that random dense sets have high randomness measure. Now let $S$ be a positive-density set without 3-APs; we want to show that this condition is enough to guarantee that $S$ has low randomness measure.

Now for our modified-probabilistic argument. We want to show that “low-randomness” sets can’t be uniformly distributed (mod d) for every d. Optimally, we also want to introduce a tool that tells us how much a set “looks like” an arithmetic progression with common difference d; this is because we need to obtain quantitative bounds for the density-increment argument, in order to ensure that the densities of our set on the sequence of APs don’t monotonically approach, say, 0.0000001.

Okay, spoiler time again — if you don’t want to know where this is going, skip the next few paragraphs. This idea, it turns out, can be made to work. Even better, the “randomness measure” and the tool we need to compute the right common difference are essentially the same; both are related to the Fourier coefficients of our set.

With this in mind, the next few installments of GILA are going to be more specialized than Nos. 1 and 2 were. I plan to split the proof into two parts, as outlined in the sketch. Meanwhile, as either GILA 2.5 or GILA 3.5, I plan to derive the discrete Fourier transform “from scratch” as it applies to this problem — that installment won’t be necessary to following the series, but it’ll be fun, and maybe even useful for seeing how Fourier analysis and additive combinatorics relate. End of spoilers.

GILA 2.5 or GILA 3 will appear by the middle of next week — for real this time (unless I catch swine flu).

The word count at the end of the last paragraph was 1729, and tempted though I am to leave it there, I have a few things I want to say. First of all, it’s not entirely true that iterative arguments of the type presented are necessary. Croot and Sisask present a proof of Roth’s theorem with very little iteration, although it’s still fairly similar to the Fourier-analytic proof.

Second: Although, to the best of my knowledge, no strictly probabilistic proof of Roth’s theorem is known, I’d be remiss not to mention the polymath proof. The polymath proof establishes the k=3 case of the density version of the Hales-Jewett theorem, which in particular implies Roth’s theorem. While I’m not intimately familiar with the details of the proof, my impression is that they also proceed by a density-increment strategy, but sidestep the need for Fourier analysis by the combination of a very clever choice of distribution and elementary arguments inspired by ergodic theory.

See you next week!

10 Responses to “GILA 2: The probabilistic approach”

1. GILA 3: The Fourier transform, and why it works « Portrait of the Mathematician Says:

[…] Portrait of the Mathematician Uninformed opinions on math and more « GILA 2: The probabilistic approach […]

2. Blas Says:

In the Theorem, when you say that “(In fact, we can take N=3)”, it is not true. We have A={1,2,4} in {1,2,3,4} or A={1,2,4,5} in {1,2,3,4,5} as counterexamples. I think that the right number would be N=5, if I am not wrong.
Also, I do not understand very well how the probabilistic proof of the Theorem ends. If a random subset of {1,…,n} of density >2/3 contains a random 3-AP with positive probability, I do not see how can we deduce that EVERY subset with that density contains one.
In any case, I agree with the Theorem; I can prove without probabilistic methods that given delta>2/3 we have an N such that every subset of {1,..,n} with n>N and at least delta·n elements contains three consecutive integers.
But if you could explain more clearly the probabilistic proof and how it guarantees the existence of N satisfying the Theorem, that would throw some light…
Many thanks.

3. blogs Says:

Inform potential clients of the the huge benefits that
the goods can offer them. Get back, perhaps
the lowest firms are able to get available online for
and also be significant. They focus on all the parts which
are important for internet marketing.

4. raspberry ketone supplement Says:

I think the admin of this web page is truly
working hard in favor of his web page, because here every data is quality based stuff.

5. pure raspberry ketone gnc Says:

Hey there excellent website! Does running a blog like this require a massive amount work?
I have no knowledge of computer programming however I was hoping to
start my own blog in the near future. Anyway, should you have any suggestions or techniques for new blog owners please share.

I understand this is off topic but I simply had to

6. raspberry keytones Says:

My brother recommended I might like this blog. He was entirely right.
This submit truly made my day. You cann’t consider just how much time I had spent for this information! Thanks!

7. ruralvets.org.nz Says:

I am in fact grateful to the owner of this site who has shared this
enormous post at at this place.

8. natural sleeping aids Says:

Your mode of describing everything in this paragraph is genuinely nice, every one be able to easily be
aware of it, Thanks a lot.

9. Clash of Clans Town Hall 8 Says:

Hey there, I think your website might be having