## GILA 3: The Fourier transform, and why it works

At the end of the last post, we outlined an approach to proving Roth’s theorem via a density-increment method. I’ll re-post it here, a little more fleshed out, for convenience.

Sketch. The idea is to deal with “structured sets” (like sets that are themselves long arithmetic progressions) and “pseudo-random sets” in different ways; we saw last time that the density-increment argument looks promising for structured sets, but “random” sets won’t succumb to it.

First, we need to make rigorous our intuition that “random enough” sets of positive density will have length-3 APs. The obvious way to do this is by showing the contrapositive. We want to find some suitable “randomness measure” such that random dense sets have high randomness measure. Now let $S$ be a positive-density set without 3-APs; we want to show that this condition is enough to guarantee that $S$ has low randomness measure.

Now for our modified-probabilistic argument. We want to show that “low-randomness” sets can’t be uniformly distributed (mod d) for every d; this guarantees the existence of an AP on which it has higher density. We pass to this AP; rinse, lather, and repeat, until our density rises above some pre-set bound ($\delta > 2/3$ will do nicely.) We also want to introduce a tool that tells us how much a set “looks like” an arithmetic progression with common difference d; this is because we need to obtain quantitative bounds for the density-increment argument, in order to ensure that the densities of our set on the sequence of APs don’t monotonically approach, say, 0.0000001.

So we’re looking for two technical tools; one that quantifies the “randomness” of a set, and one that lets us correlate our set with a candidate arithmetic progression to pass to so that our density-increment argument will actually work quantitatively. As it turns out, though, we can find one tool that will do both jobs for us; the discrete Fourier transform.

Okay, so I’m getting ahead of myself here; I haven’t really motivated the DFT’s usefulness in this situation yet. This is partially because I’m lazy, but mostly because, if you aren’t at least a little familiar with the basic concepts of Fourier analysis, it really does pretty much come out of left field. (In another post, possibly an appendix or not an official GILA post at all, I’ll take a shot at “building the DFT to order” for this purpose, but not now.) So first I’ll give a crash course on the general goals and principles of Fourier analysis.

Let’s say that you’re trying to play the opening motif (“Fate knocking at the door”) of Beethoven’s Fifth, but all you have is a set of tuning forks. Well, the first note is a G — specifically, as long as we’re grossly oversimplifying things, it’s the G above middle C. If you take out your tuning fork for this note and hit it with a pen, it’ll emit almost a pure tone — a sine wave of about 392 Hz. If you’re anything like me, you have an image in your mind of this sound wave which looks something like this:

(That’s actually a 440 Hz wave, and so would be the opening note of Beethoven’s Fifth Ninth Symphony. And if you don’t get the joke, it means you’re either not overthinking it or aren’t immediately familiar with the key signatures of Beethoven’s symphonies, both of which are probably good things.) Of course, this image is totally not a good representation of the physical reality of the sound wave, but it’s useful nonetheless, so let’s keep it in mind. First of all, note that this wave is really determined by only two parameters — the frequency, 392 Hz, and the amplitude. The phase is unimportant in what we’re considering. Furthermore, the frequency has an obvious “visual” interpretation; it’s the reciprocal of the distance between two consecutive peaks of the graph.

This is all well and good, but the first sounds of a symphony orchestra playing Beethoven won’t be pure sine waves. There are instruments playing an octave lower; the different instruments have a characteristic timbre, with overtones and undertones and all sorts of other things. The image in your mind should be something like this:

Now, it’s not clear geometrically that this has a well-defined frequency or amplitude; even if you make it periodic by repeating it over and over, although there’s a period and a wavelength, it’s certainly not determined by this wavelength. But the key insight of Fourier analysis is that you can decompose a periodic function into a sum of simple trigonometric functions, which are characterized by their frequency alone. We do this by means of the Fourier transform, which takes a “time-domain” representation of a function and converts it to its “frequency domain.” If the transformed version $\hat{f}$ of a function $f$ takes on a large value at $\xi$, then $f$ is in a sense correlated with $sin(\xi t)$.

Hopefully the connection to Roth’s theorem is becoming clear — we want to find a tool that tells us, loosely, how much our set is correlated with some arithmetic progression of difference $d$, and because of the averaging argument, we don’t care about the “phase.” So looking for an analogy of the Fourier transform seems promising.

As it happens, the Fourier transform in general isn’t just defined on real- or complex-valued functions of the real line, but can be abstractly extended to functions on other groups. There are still two problems, though: First, we’re currently considering a set and not a function. This is simple to resolve, however, since we can just think of our set $S$ as “really” being its characteristic function $1_S$, which takes the value 1 on elements of $S$ and 0 elsewhere.

Second, and more subtly, neither the integers nor $\{1, 2, \ldots, N\}$ seem to have useful analogues of the Fourier transform — the latter isn’t even a group, and the former doesn’t allow the right notion of periodicity in the case of Roth’s theorem. But, as it turns out, the cyclic group $\mathbb{Z}/N \mathbb{Z}$ (which I’ll henceforth denote as the easier-to-TeX $\mathbb{Z}_N$) does have a good notion of the Fourier transform, and we can transform the finitary version of Roth’s theorem into a statement about this group. However, this transformation is rather tedious and is best carried out at the end anyway, so I won’t describe it here. Instead, here’s the definition of the discrete Fourier transform at the heart of the proof:

Let $f$ be a complex-valued function on $Z_N$. Then the discrete Fourier transform $\hat{f}$ is defined by

$\hat{f}(j) = \sum_{k=0}^{N-1} f(k) e^{-\frac{2\pi i j k}{N}}$.

(Aside: The Fourier transform can be seen as a generalization and modification of basic forms of the probabilistic method — indeed, the coefficient $\hat{f}(0)$ is just $\delta N$ when f is the characteristic function of our set. This is seen even more effectively if we normalize the transform by dividing by N. I could present the arguments using this modified DFT, and it wouldn’t seriously affect the proof, but it’s clearer in the long run to stick with the usual definition, and that’s what I’ll do.)

There are a few basic properties of this transform which we’re going to use; I won’t prove them here, since proofs are quite easy to find.

Plancherel’s identity. $\sum_x |f(x)|^2 = \frac{1}{N}\sum_j |\hat{f}(j)|^2$.

(Aside: This is one of the most basic and fundamental theorems in Fourier analysis, and shows up everywhere — for instance, it can be considered to be the crucial ingredient in Shor’s quantum factoring algorithm. It’s well worth knowing, in both its discrete and continuous versions.)

The more general identity $\sum_x f(x)\overline{g(x)} = \frac{1}{N}\sum_j \hat{f}(j)\overline{\hat{g}(j)}$,

and its immediate corollary, that the Fourier transform of $h(x) = \sum_y f(y)\overline{g(y-x)}$ is $\hat{f}(k)\overline{\hat{g}(k)}$.

We also use the inequality $|\hat{f}(k)| \leq \sum_x |f(x)|$, which is clear from the definition.

I had hoped to give half of the proof of Roth’s theorem in this post, but it’s getting rather long (and it’s also rather later than I had wanted to post it), so I’ll save it for the next post, and just give a preview here. I hope that I’ve presented this material in such a way that it seems believable that the DFT allows one to estimate the correlation, or linear bias, with an AP of some fixed distance (at least if you believe that the DFT shares some of the essential properties of the standard Fourier transform). Here I’ll try to explain how it can also give a useful notion of randomness of the set.

(By the way, from here on out, except possibly for the “appendices”, I’ll be mostly following Neil Lyall’s excellent exposition, which is designed to be accessible for undergraduates without giving up any of the technical detail. I’ll  still be using Tao and Vu and some other expositions of the theorem as well, but if for some reason you must stop reading this series here, you wouldn’t lose too much by switching to Lyall.)

So we’re working over the finite group $\mathbb{Z}_N$, and we have some subset $S$ with density $\delta$. We can assume, essentially without loss of generality, that N is odd. We’ll again switch from existence to counting (which is one of the most generally applicable tricks in all of combinatorics, and which we’ve used before and will use again) and try to compute $N_0$, the number of solutions to $x + y = 2z$ (mod N). (Note: Here we are including trivial arithmetic progressions $x, x, x$, as it simplifies the calculations greatly to do so. Since N is odd, these are the only degenerate APs, and it turns out that, since there are only $\delta N$ of them, we’ll be able to essentially regard them as an insignificant error term.)

We see “by inspection” (of the complex plane) that $\frac{1}{N} \sum_k e^{\frac{-2 \pi i x k}{N}}$ is equal to 1 if $x = 0$ (mod N), and is 0 elsewhere — this gives us a nice “characteristic function” for congruences (mod N), which is amenable to Fourier analysis. So we have

$N_0 = \sum_x \sum_y \sum_z \frac{1}{N} \sum_k e^{\frac{-2 \pi i (x+y-2z) k}{N}}$,

where $x, y, z$ range over $S$. Then denoting the characteristic function of $S$ by $\widehat{1_S}$, one can check that the RHS of the above is equal to

$\frac{1}{N} \sum_k \widehat{1_S}(k)^2 \widehat{1_s}(-2k)$.

Since, as noted way above, $\widehat{1_s}(0)$ is $\delta N$, we can view the above as saying that $N_0 = \delta^3 N^2$ plus an error term related to the other Fourier coefficients of $S$. Now if all the remaining Fourier coefficients are small, the magnitude of the error term isn’t enough to cancel out the leading term $\delta^3 N^2$; we’ll show this all more rigorously, and more motivatedly, next time. Fortuitously, it’s also straightforward to check that if our set is chosen randomly, the Fourier coefficients will in general be quite small. So we’ll use the following definition:

$S$ is $\epsilon$-pseudorandom if $\widehat{1_S}(j) \leq \epsilon N$ for $j \neq 0$ (mod N).

As it turns out, this is suitable for our approach to Roth’s theorem, and we’ll see this in the next post.

I don’t want to make any promises I won’t be able to keep, so I can’t say for sure when GILA 4 will be up. (It doesn’t help that we’re starting now to get into material I don’t fully grok.) However, with a bit of luck it should be up by late next week, potentially with a non-GILA post in between.