This post also provides a good opportunity to refine some heuristics I had previously proposed regarding Siegel zeroes and their impact on various problems in analytic number theory. In this previous blog post, I wrote

Conjecture 1 (Finite time blowup) There exists a manifold and a smooth solution to the Euler equations that blows up at some finite time .

This alters the effect of sieving in such residue classes. Consider for instance the classical sieve of Eratosthenes. If one sieves out for each prime , the sieve of Eratosthenes tells us that the surviving elements of are simply the primes between and , of which there are about many. However, if one restricts attention to for a quadratic residue class (and taking to be somewhat large compared to ), then by the preceding discussion, this eliminates most primes, and so now sieving out should leave almost no survivors. Shifting this example by and then dividing by , one can end up with an example of an interval of length that can be sieved by residue classes for each in such a manner as to leave almost no survivors (in particular, many). In the presence of a Siegel zero, it seems quite difficult to prevent this scenario from “infecting” the above problem, creating a bad scenario in which for all , the residue classes for already eliminate almost all elements of , leaving it mathematically possible for the remaining survivors to either be prime, or eliminated by the remaining residue classes for .

8 September, 2024 in math.AP, math.MP | Tags: finite time blowup, incompressible Euler equations | by Terence Tao | 7 comments

I actually agree with Granville here, and I propose here a synthesis of the two situations. In the absence of a Siegel zero, standard heuristic models in analytic number theory (such as the ones discussed in this post) typically suggest that a given quantity of interest in number theory (e.g., the number of primes in a certain set) obey an asymptotic law of the form

Ben Krause, Hamed Mousavi, Joni Teräväinen, and I have just uploaded to the arXiv the paper “Pointwise convergence of bilinear polynomial averages over the primes“. This paper builds upon a previous result of Krause, Mirek, and myself, in which we demonstrated the pointwise almost everywhere convergence of the ergodic averages as and almost all , whenever is a measure-preserving system (not necessarily of finite measure), and , for some with , where is a polynomial with integer coefficients and degree at least two. Here we establish the prime version of this theorem, that is to say we establish the pointwise almost everywhere convergence of the averages

The basic strategy is to try to insert the weight everywhere in the proof of the convergence of (1) and adapt as needed. The weighted averages are bilinear averages associated to the bilinear symbol

For related reasons, it is also challenging to incorporate assistance from modern AI tools into a research project, as these tools can “hallucinate” plausible-looking, but nonsensical arguments, which therefore need additional verification before they could be added into the project.

Problem 1 (Erdös #135) Let be a set of points such that any four points in the set determine at least five distinct distances. Must determine many distances?

I have been playing with one approach to this conjecture, which reduces to solving a certain underdetermined system of partial differential equations, and then establishing some stability result for the resulting solution. However, I have not been able to make headway on solving this latter system despite its underdetermined nature; so I thought I would record my partial attempt here in case anyone is interested in pursuing it further (and also to contribute to the practice of sharing unsuccessful attempts to solve a problem, which is still quite infrequently done in our community).

A key case is when for some . Here, the residue classes for sieve out everything in except for primes and semiprimes, and specifically the semiprimes that are product of two primes between and . If one can show for some that the largest gap between semiprimes in say with prime factors in is , then this would affirmatively answer the first part of this problem (and also the second). This is certainly very plausible – it would follow from a semiprime version of the Cramér conjecture (and this would also make the more precise prediction ) – but remains well out of reach for now. Even assuming the Riemann hypothesis, the best upper bound on prime gaps in is , and the best upper bound on semiprime gaps is not significantly better than this – in particular, one cannot reach for any . (There is a remote possibility that an extremely delicate analysis near , together with additional strong conjectures on the zeta function, such as a sufficiently quantitative version of the GUE hypothesis, may barely be able to resolve this problem, but I am skeptical of this, absent some further major breakthrough in analytic number theory.)

Using this and a standard probabilistic argument, Dumitrescu then established the following “near miss” to a negative answer to the above problem:

The project was launched on the day of the blog post, and has been running for a hectic 19 days thus far; see my personal log of the project for a day-by-day summary of events. From the perspective of raw implications resolved, the project is (of the time of writing) 99.9963% complete: of the implications to resolve, have been proven to be true, have been proven to be false, and only remain open, although even within this set, there are implications that we conjecture to be false and for which we are likely to be able to formally disprove soon. For reasons of compilation efficiency, we do not record the proof of every single one of these assertions in Lean; we only prove a smaller set of implications in Lean, which then imply the broader set of implications through transitivity (for instance, using the fact that if Equation implies Equation and Equation implies Equation , then Equation implies Equation ); we will also shortly implement a further reduction utilizing a duality symmetry of the implication graph.

At a qualitative level, the Bayesian identity (1) is telling us the following: if an alternative hypothesis was already somewhat plausible (so that the prior odds was not vanishingly small), and the observed event was significantly more likely to occur under hypothesis than under , then the hypothesis becomes significantly more plausible (in that the posterior odds become quite elevated). This is quite intuitive, but as discussed in the previous post, a lot hinges on how one is defining the alternative hypothesis .

In this post, I would like to illustrate this computation-outsourced approach with the topic of zero-density theorems for the Riemann zeta function, in which all computer verifiable calculations (as well as other routine but tedious arguments) are performed “off-stage”, with the intent of focusing only on the conceptual inputs to these theorems.

Experience has shown that the most important quantity to control here is the exponent , defined as the least constant for which one has an asymptotic

Zero-density theorems concern upper bounds for the quantity for a given and large , which is defined as the number of zeroes of the Riemann zeta function in the rectangle . (There is also an important generalization of this quantity to -functions, but for simplicity we will focus on the classical zeta function case here). Such quantities are important in analytic number theory for many reasons, one of which is through explicit formulae such as the Riemann-von Mangoldt explicit formula relating the prime numbers to the zeroes of the zeta function (the “music of the primes”). The better bounds one has on , the more control one has on the complicated term on the right-hand side.

More generally, we are trying to document all of the processes and lessons learned from this setup, and this will be part of a forthcoming paper on this project, which we are now in the preliminary stages of planning, and will likely include dozens of authors.

Given a smooth compact Riemannian manifold , the incompressible Euler equations can be written in abstract index notation as

Theorem 2 (i) For any , there exists a set of natural numbers with for all large , for which (2) stays bounded. (ii) Conversely, if (2) stays bounded, then for all large .

From (1), (2) we can find sequences of -smooth numbers in , with each sequence being to the right of the previous sequence. By the above observation, each sequence contains some non-trivial subcollection that multiplies to a square. Concatenating all these subsequences together, we obtain a single sequence with at least partial products multiplying to a square, giving the desired lower bound on .

Proof: Let be a small constant, and let be a random subset of , formed by placing each element of with an independent probability of . A standard application of Hoeffding’s inequality (or even the second moment method) shows that this set will have cardinality with high probability if is large enough. On the other hand, each of the patterns has a probability of lying inside , so by linearity of expectation, the total number of such patterns inside is on the average. In particular, by Markov’s inequality, we can find a set of cardinality with only such patterns. Deleting all of these patterns from , we obtain a set of cardinality , which is if is a sufficiently small constant. This establishes the claim.

In the previous blog post, this calculation was initially illustrated with the following choices of , , and (thus fulfilling ingredient (i)): was the event that the October 1, 2022 PSCO Grand Lotto in the Phillippines drew the numbers (that is to say, consecutive multiples if ), though not necessarily in that order; was the null hypothesis that the lottery was fair and the numbers were drawn uniformly at random (without replacement) from the set ; and was the alternative hypothesis that the lottery was rigged by some corrupt officials for their personal gain. In this case, ingredient (iii) can be computed mathematically and precisely: And, with a not inconceivable level of cynicism about the integrity of the lottery, the prior odds (ingredient (ii)) can be argued to be non-negligible. However, ingredient (iv) was nearly impossible to estimate: indeed, as argued in that post, there is no reason to suspect that is much larger than the tiny probability (2), and in fact it could well be smaller (since would likely be in the interest of corrupt lottery officials to not draw attention to their activities). So, despite the very small numerical value of the probability (2), this did not lead to any significant increase in the odds of the alternative hypothesis. In the previous blog post, several other variants , , of the alternative hypothesis were also discussed; the conclusion was that while some choices of alternative hypothesis could lead to elevated probabilities for ingredient (iv), they came at the cost of significantly reducing the prior odds in ingredient (ii), and so no alternative hypothesis was located which ended up being significantly more plausible than the null hypothesis after observing the event .

I’ve just uploaded to the arXiv my paper “Planar point sets with forbidden -point patterns and few distinct distance“. This (very) short paper was a byproduct of my recent explorations of the Erdös problem website in recent months, with a vague emerging plan to locate a suitable problem that might be suitable for some combination of a crowdsourced “Polymath” style project and/or a test case for emerging AI tools. The question below was one potential candidate; however, upon reviewing the literature on the problem, I noticed that the existing techniques only needed one additional tweak to fully resolve the problem. So I ended up writing this note instead to close off the problem.

I’ve arranged this post so that this additional trick is postponed to below the fold, so that the reader can, if desired, try to guess for themselves what the final missing ingredient needed to solve the problem was. Here is the problem (Erdös problem #135), which was asked multiple times by Erdös over more than two decades (and who even offered a small prize for the solution on one of these occasions):

Because of this, I suspect that it will not be possible to resolve this Erdös problem without a major breakthrough on the parity problem that (at a bare minimum) is enough to exclude the possibility of Siegel zeroes existing. (But it is not clear at all that Siegel zeroes are be the only “enemy” here, so absent a major advance in “inverse sieve theory”, one cannot simply assume GRH to run away from this problem).

Let us now explore the problem further. Let us call a natural number bad if , so the first part of the problem is asking whether there exist bad numbers that are sufficiently large. We unpack the definitions: is bad if and only if for any composite , so placing in intervals of the form we are asking to show that

17 September, 2024 in math.CA, math.DS, math.NT, paper | Tags: Ben Krause, ergodic theory, Hamed Mousavi, Joni Teravainen | by Terence Tao | 9 comments

This is a cousin of the significantly more famous Erdös distinct distances problem (Erdös problem #89), which asks what is the minimum number of distances determined by a set of points in the plane, without the restriction on four-point configurations. The example of a square grid (assuming for sake of argument that is a perfect square), together with some standard analytic number theory calculations, shows that can determine distances, and it is conjectured that this is best possible up to constants. A celebrated result of Guth and Katz, discussed in this previous blog post, shows that will determine at least distances. Note that the lower bound here is far larger, and in fact comparable to the total number of distances available, thus expressing the belief that the “local” condition that every four points determine at least five distances forces the global collection distances to be almost completely distinct. In fact, in one of the papers posing the problem, Erdös made the even stronger conjecture that the set must contain a subset of cardinality for which all the distances generated by are distinct.

A brute force computation of the poset would then require comparisons, which looks rather daunting; but of course due to the axioms of a partial order, one could presumably identify the poset by a much smaller number of comparisons. I am thinking that it should be possible to crowdsource the exploration of this poset in the form of submissions to a central repository (such as the github repository I just created) of proofs in Lean of implications or non-implications between various equations, which could be validated in Lean, and also checked against some file recording the current status (true, false, or open) of all the comparisons, to avoid redundant effort. Most submissions could be handled automatically, with relatively little human moderation required; and the status of the poset could be updated after each such submission.

Problem 1 Let be integers. How many of the partial products , , , can be squares? Is it true that, for any , there can be more than squares?

The above Hasse diagram does not just assert implications between the listed equational axioms; it also asserts non-implications between the axioms. For instance, as seen in the diagram, the commutative axiom Equation7 does not imply the Equation4 axiom

To illustrate the project I have in mind, let me first introduce eleven examples of equational axioms for magmas: Equation1: Equation2: Equation3: Equation4: Equation5: Equation6: Equation7: Equation8: Equation9: Equation10: Equation11: Thus, for instance, Equation7 is the commutative axiom, and Equation10 is the associative axiom. The constant axiom Equation1 is the strongest, as it forces the magma to have at most one element; at the opposite extreme, the reflexive axiom Equation11 is the weakest, being satisfied by every single magma.

A paper of Dumitrescu came close to resolving this problem. Firstly, the number of ways in which four points could fail to determine five distinct distances was classified in that paper, with the four-point configurations necessarily being one of the following eight patterns: : An equilateral triangle plus an arbitrary vertex. : A parallelogram. : An isosceles trapezoid (four points on a line, , where , form a degenerate isosceles trapezoid). : A star with three edges of the same length. : A path with three edges of the same length. : A kite. : An isosceles triangle plus an edge incident to a base endpoint, and whose length equals the length of the base. : An isosceles triangle plus an edge incident to the apex, and whose length equals the length of the base. (See Figure 1 and Lemma 1 of Dumitrescu’s paper.) So the question is asking whether if an point set avoids all of these patterns , then it must generate distances.

At first glance, this looks like a somewhat arbitrary problem (as many of Erdös’s problems initially do), as the function is not obviously related to any other well-known function or problem. However, it turns out that this problem is closely related to the parity barrier in sieve theory (as discussed in this previous post), with the possibility of Siegel zeroes presenting a particular obstruction. I suspect that Erdös was well aware of this connection; certainly he mentions the relation with questions on gaps between primes (or almost primes), which is in turn connected to the parity problem and Siegel zeroes (as is discussed recently in my paper with Banks and Ford, and in more depth in these papers of Ford and of Granville).

Given that multiplicative number theory does not seem powerful enough (even on RH) to resolve these problems, the other main approach would be to use sieve theory. In this theory, we do not really know how to exploit the specific location of the interval or the specific congruence classes used, so one can study the more general problem of trying to cover an interval of length by one residue class mod for each , and only leaving a small number of survivors which could potentially be classified as “primes”. The discussion of the small case already reveals a problem with this level of generality: one can sieve out the interval by the residue classes for , and leave only one survivor, . Indeed, thanks to known bounds on Jacobsthal’s function, one can be more efficient than this; for instance, using equation (1.2) from this paper of Ford, Green, Konyagin, Maynard, and myself, it is possible to completely sieve out any interval of sufficiently large length using only those primes up to . On the other hand, from the work of Iwaniec, we know that sieving up to is insufficient to completely sieve out such an interval; related to this, if one only sieves up to for some , the linear sieve (see e.g., Theorem 2 of this previous blog post) shows that one must have at least survivors, where can be given explicitly in the regime by the formula

7 July, 2024 in expository, math.NT, Uncategorized | Tags: computer assistance, Riemann zeta function, zero density theorems | by Terence Tao | 7 comments

Given that we have one “near-miss” in the literature that avoids , and another “near-miss” that avoids , it is natural to try to combine these two constructions to obtain a set that avoids all eight patterns . This inspired the following problem of Dumitrescu (see Problem 2 of this paper):

It is now tempting to try to set up an approximately scale-invariant blowup solution. It seems that the first step in this is to construct a “soliton” type localized steady state solution, that is a solution , to the equation

So, as an experiment, I thought I would record here my preliminary observations on one such problem – Erdös problem #385 – to discuss why it looks difficult to solve with our current understanding of the primes. Here is the problem:

One recent example of such a project was the Busy Beaver Challenge, which showed this July that the fifth Busy Beaver number was equal to . Some older crowdsourced computational projects, such as the Great Internet Mersenne Prime Search (GIMPS), are also somewhat similar in spirit to this type of project (though using more traditional proof of work certificates instead of proof assistants). I would be interested in hearing of any other extant examples of crowdsourced projects exploring a mathematical space, and whether there are lessons from those examples that could be relevant for the project I propose here.

This ends the survey of the prior literature on this problem. Can you guess the missing ingredient needed to resolve the problem? I will place the answer below the fold.

To prove the lower bound on , which is a variant of Theorem 2. The key observation is that given any -smooth numbers , some non-trivial subcollection of them will multiply to a square. This is essentially Lemma 4.2 of Bui–Pratt–Zaharescu, but for the convenience of the reader we give a full proof here. Consider the multiplicative homomorphism defined by

This problem is mentioned on page 73 of this 1979 paper of Erdös (where he attributes the problem to an unpublished work of Eggelton, Erdös, and Selfridge that, to my knowledge, has never actually appeared), as well as briefly in page 92 of this 1980 paper of Erdös and Graham.

It would be nice to be able to integrate such a project with some sort of graph visualization software that can take an incomplete determination of the poset as input (in which each potential comparison in is marked as either true, false, or open), completes the graph as much as possible using the axioms of partial order, and then presents the partially known poset in a visually appealing way. If anyone knows of such a software package, I would be happy to hear of it in the comments.

I will remark that the general question of determining whether one set of equational axioms determines another is undecidable; see Theorem 14 of this paper of Perkins. (This is similar in spirit to the more well known undecidability of various word problems.) So, the situation here is somewhat similar to the Busy Beaver Challenge, in that past a certain point of complexity, we would necessarily encounter unsolvable problems; but hopefully there would be interesting problems and phenomena to discover before we reach that threshold.

It then remained to improve the lower bound construction to eliminate the losses in the exponents. By deconstructing the proof of the upper bound, it became natural to consider something like the set of natural numbers that had at most prime factors. This construction actually worked for some scales – namely those for which was a natural number – but there was some strange “discontinuities” in the analysis that prevented me from establishing the boundedness of (2) for arbitrary scales . The basic problem was that increasing the number of permitted prime factors from one natural number threshold to another ended up increasing the density of the set by an unbounded factor (of the order of , in practice), which heavily disrupted the task of trying to keep the ratio (2) bounded. Usually the resolution to these sorts of discontinuities is to use some sort of random “average” of two or more deterministic constructions – for instance, by taking some random union of some numbers with prime factors and some numbers with prime factors – but the numerology turned out to be somewhat unfavorable, allowing for some improvement in the lower bounds over my previous construction, but not enough to close the gap entirely. It was only after substantial trial and error that I was able to find a working deterministic construction, where at a given scale one collected either numbers with at most prime factors, or numbers with prime factors but with the largest prime factor in a specific range, in which I could finally get the numerator and denominator in (2) to be in balance for every . But once the construction was written down, the verification of the required properties ended up being quite routine.

If one lets denote the maximal number of squares amongst such partial products, it was observed in the paper of Erdös and Graham that the bound is “trivial” (no proof was provided, but one can for instance argue using the fact that the number of integer solutions to hyperelliptic equations of the form for fixed is quite sparse, and in fact finite for thanks to Siegel’s theorem), and the problem then asks if .

From (1) we see that the number of for which (i) occurs is at most . From the union bound we see that the number of for which (ii) occurs is at most

Anyway, I would be happy to receive any feedback on this project; in addition to the previous requests, I would be interested in any suggestions for improving the project, as well as gauging whether there is sufficient interest in participating to actually launch it. (I am imagining running it vaguely along the lines of a Polymath project, though perhaps not formally labeled as such.)

Ingredient (ii) – the prior odds that is true over – is highly subjective, and an individual’s estimation of (ii) would likely depend on, or at least be correlated with, their opinion of the current Venezulan administration. Discussion of this ingredient is therefore more political than mathematical, and I will not attempt to quantify it further here. Now we turn to (iii), the estimation of the probability that occurs given the hypothesis . This cannot be computed exactly without a precise probabilistic model of the voting electorate, but let us make a rough order of magnitude calculation as follows. One can focus on the anomaly just for the number of votes received by Maduro and González, since if both of these counts were the nearest integer to a round percentages then just from simple subtraction the number of votes for “other” would also be forced to also be the nearest integer from a round percentage, possibly plus or minus one due to carries, so up to a factor of two or so we can ignore the latter anomaly. As a simple model, suppose that the voting percentages for Maduro and González were distributed more or less uniformly in some square , where are some proportions not too close to either or , and is some reasonably large margin of error (the exact values of these parameters will end up not being too important, nor will the specific shape of the distribution; indeed, the shape and size of the square here only impacts the analysis through the area of the square, and even this quantity cancels itself out in the end). Thus, the number of votes for Maduro is distributed in an interval of length about , where is the number of voters, and similarly for González, so the total number of different outcomes here is , and by our model we have a uniform distribution amongst all these outcomes. On the other hand, the total number of attainable round percentages for Maduro is about , and similarly for González, so our estimate for is

(For an explanation of what is going on under the assumption of the Lindelöf hypothesis, see below the fold.) This plot represents the combined effort of nearly a dozen papers, each one of which claims one or more components of the depicted piecewise smooth curve, and is written in the “human-readable” style mentioned above, where the argument is arranged to reduce the amount of tedious computation to human-verifiable levels, even if this comes the cost of obscuring the conceptual ideas. (For an animation of how this bound improved over time, see here.) Below the fold, I will try to describe (in sketch form) some of the standard ingredients that go into these papers, in particular the routine reduction of deriving zero density estimates from large value theorems for Dirichlet series. We will not attempt to rewrite the entire literature of zero-density estimates in this fashion, but focus on some illustrative special cases.

However, I believe that this sort of paradigm can also be used to explore new mathematics, as opposed to formalizing existing mathematics. The online collaborative “Polymath” projects that several people including myself organized in the past are one example of this; but as they did not incorporate proof assistants into the workflow, the contributions had to be managed and verified by the human moderators of the project, which was quite a time-consuming responsibility, and one which limited the ability to scale these projects up further. But I am hoping that the addition of proof assistants will remove this bottleneck.

Problem 1 Is it true that if is a set of natural numbers for which goes to infinity as , then the quantity also goes to infinity as ?

The purpose of this blog post is to record this modification of the argument, which is short enough to present immediately. For a large , let denote the quantity

The metric is hidden in this system through the covariant derivative . To eliminate the metric, we can lower indices to write

UPDATE, Sep 30 2024: The project is up and running (and highly active), with the main page being this Github repository. See also the Lean Zulip chat for some (also very active) discussion on the project.

In a recent article, Heather Macbeth argues that the preferred balance between conceptual and computational arguments is quite different for a computer-assisted proof than it is for a purely human-readable proof. In the latter, there is a strong incentive to minimize the amount of calculation to the point where it can be checked by hand, even if this requires a certain amount of ad hoc rearrangement of cases, unmotivated parameter selection, or otherwise non-conceptual additions to the arguments in order to reduce the calculation. But in the former, once one is willing to outsource any tedious verification or optimization task to a computer, the incentives are reversed: freed from the need to arrange the argument to reduce the amount of calculation, one can now describe an argument by listing the main ingredients and then letting the computer figure out a suitable way to combine them to give the stated result. The two approaches can thus be viewed as complementary ways to describe a result, with neither necessarily being superior to the other.

The upper bound arguments seem more crude to the author than the lower bound arguments, so I conjecture that the lower bound is in fact the truth: .

Problem 1 (Erdös Problem #385) Let where is the least prime divisor of . Is it true that for all sufficiently large ? Does as ?

It turns out, though, that the answer to the above problem is negative; one can find sets that are denser than the primes, but for which (2) stays bounded, so that the least common multiples in the set are unusually large. It was a bit surprising to me that this question had not been resolved long ago (in fact, I was not able to find any prior literature on the problem beyond the original reference of Erdős and Graham); in contrast, another problem of Erdős and Graham concerning sets with unusually small least common multiples was extensively studied (and essentially solved) about twenty years ago, while the study of sets with unusually large greatest common divisor for many pairs in the set has recently become somewhat popular, due to their role in the proof of the Duffin-Schaeffer conjecture by Koukoulopoulos and Maynard.

Perhaps the only thing that I was expecting to see at this point that has not yet materialized is significant contributions from modern AI tools. They are being used in a number of secondary ways in this project, for instance through tools such as Github copilot to speed up the writing of Lean proofs, the LaTeX blueprint, and other software code, and several of our visualization tools were also largely co-written using large language models such as Claude. However, for the core task of resolving implications, the more “good old-fashioned” automated theorem provers have so far proven superior. However, most of the remaining 700 or so implications are not amenable to these older tools, and several (particularly the ones involving “Asterix” and “Obelix” had stymied the human collaborators for several days), so I can still see a role for modern AI to play a more active role in finishing off the hardest and most stubborn of the remaining implications.

Traditionally, mathematics research projects are conducted by a small number (typically one to five) of expert mathematicians, each of which are familiar enough with all aspects of the project that they can verify each other’s contributions. It has been challenging to organize mathematical projects at larger scales, and particularly those that involve contributions from the general public, due to the need to verify all of the contributions; a single error in one component of a mathematical argument could invalidate the entire project. Furthermore, the sophistication of a typical math project is such that it would not be realistic to expect a member of the public, with say an undergraduate level of mathematics education, to contribute in a meaningful way to many such projects.

Roughly speaking, their argument proceeds by a multiscale construction, in which the solution is set up to eventually have some presence at a spatial scale , which is conducive to generating an exponential “stretching” of a small forcing term at a much higher spatial scale , which one then introduces to then set up the solution for the next scale.

Proof assistant languages, such as Lean, provide a potential way to overcome these obstacles, and allow for large-scale collaborations involving professional mathematicians, the broader public, and/or AI tools to all contribute to a complex project, provided that it can be broken up in a modular fashion into smaller pieces that can be attacked without necessarily understanding all aspects of the project as a whole. Projects to formalize an existing mathematical result (such as the formalization of the recent proof of the PFR conjecture of Marton, discussed in this previous blog post) are currently the main examples of such large-scale collaborations that are enabled via proof assistants. At present, these formalizations are mostly crowdsourced by human contributors (which include both professional mathematicians and interested members of the general public), but there are also some nascent efforts to incorporate more automated tools (either “good old-fashioned” automated theorem provers, or more modern AI-based tools) to assist with the (still quite tedious) task of formalization.

9 August, 2024 in expository, math.NT | Tags: Alexandru Zaharescu, Hung Bui, Kyle Pratt, Paul Erdos | by Terence Tao | 14 comments

Clearly is non-increasing in . The Riemann-von Mangoldt formula, together with the functional equation, gives us the asymptotic

25 September, 2024 in math.RA, polymath | Tags: Artificial Intelligence, Equational Theory Project, machine assisted proof, universal algebra | by Terence Tao | 70 comments

To search for counterexamples, it is natural to look for numbers with relatively few prime factors, in order to reduce their common factors and increase their least common multiple. A particularly simple example, whose verification is on the level of an exercise in a graduate analytic number theory course, is the set of semiprimes (products of two primes), for which one can readily verify that (1) grows like but (2) stays bounded. With a bit more effort, I was able to optimize the construction and uncover the true threshold for boundedness of (2), which was a little unexpected:

These lower bounds are not believed to be best possible. For instance, the Maier–Pomerance conjecture on Jacobsthal’s function would indicate that one needs to sieve out primes up to in order to completely sieve out an interval of length , and it is also believed that sieving up to should leave survivors, although even these strong conjectures are not enough to positively resolve this problem, since we are permitted to sieve all the way up to (and we are allowed to leave every prime number as a survivor, which in view of the Brun–Titchmarsh theorem could permit as many as survivors).

However, the presence of a Siegel zero tends to “magnetize” the error term by pulling most of the fluctuations in a particular direction. In many situations, what this means is that one can obtain a refined asymptotic of the form

This in particular answers the MathOverflow question of whether there were equational axioms intermediate between the constant axiom Equation1 and the associative axiom Equation10.

It is now natural to try to understand this problem for a specific choice of interval as a function of . If is large in the sense that , then the claimed covering property is automatic, since every composite number less than or equal to has a prime factor less than or equal to . On the other hand, for very small, in particular , it is also possible to find with this property. Indeed, if one takes to lie in the residue class , then we see that the residue classes cover all of except for , and from Linnik’s theorem we can ensure that is prime. Thus, to rule out bad numbers, we need to understand the covering problem at intermediate scales .

Dumitrescu then counted how often each of the patterns occured inside the grid . The answer is: does not occur at all. (This is related to the irrationality of .) occurs times. occurs times. occurs times. occurs times. occurs times. occurs times. occurs times. (The bounds involving were obtained using the Szemerédi-Trotter theorem, and might not be optimal for this problem.) In particular, with the exception of the parallelogram pattern , the other seven forbidden -point patterns occur at most times.

Theorem 3 (Bounds for ) As , we have the lower bound and the upper bound In particular, for any , one has for sufficiently large .

“The parity problem can also be sometimes be overcome when there is an exceptional Siegel zero … [this] suggests that to break the parity barrier, we may assume without loss of generality that there are no Siegel zeroes.”

In particular, this generates a set of points with distances that avoids seven out of the eight required forbidden patterns; it is only the parallelograms that are not avoided, and are the only remaining obstacle to a negative answer to the problem.

Most of the implications here are quite easy to prove, but there is one non-trivial one, obtained in this answer to a MathOverflow post closely related to the preceding one:

Unfortunately, this random set contains far too many parallelograms ( such parallelograms, in fact) for this deletion argument to work. On the other hand, in earlier work of Thiele and of Dumitrescu, a separate construction of a set of points in that avoids all of the parallelograms was given:

Almost three weeks ago, I proposed a collaborative project, combining the efforts of professional and amateur mathematicians, automatic theorem provers, AI tools, and the proof assistant language Lean, to describe the implication graph relating the 4694 equational laws for magmas that can be expressed using up to four invocations of the magma operation. That is to say, one needs to determine the truth or falsity of the possible implications between the these 4694 laws.

In this post, I would like to run the same analysis on a numerical anomaly in the recent Venezuelan presidential election of June 28, 2024. Here are the officially reported vote totals for the two main candidates, incumbent president Nicolás Maduro and opposition candidate Edmundo González, in the election: Maduro: 5,150,092 votes González: 4,445,978 votes Other: 462,704 votes Total: 10,058,774 votes. The numerical anomaly is that if one multiplies the total number of voters by the round percentages , , , one recovers exactly the above vote counts after rounding to the nearest integer:

In between the papers of Huxley and Guth-Maynard are dozens of additional improvements on , though it is only the Guth-Maynard paper that actually lowered the supremum norm . A summary of most of the state of the art before Guth-Maynard may be found in Table 2 of this recent paper of Trudgian and Yang; it is complicated, but it is easy enough to get a computer to illustrate it with a plot:

This remains open, however there has been progress on rougher versions of this problem. For instance, there is the well-known result of Elgindi (discussed in this previous post) that when and is sufficiently small, there exists a solution to the Euler equations on that blows up in finite time. There has also been progress in establishing various “universality” properties of the Euler flow on manifolds (which informally state that “fluid computers” are possible); see for instance this recent survey of Cardona, Miranda, and Peralta-Salas. Unfortunately, these “fluid computers” do not combine well with scaling symmetries, and so thus far have not been able to produce (finite energy) blowups.

At first glance, this problem may seem rather arbitrary, but it can be motivated as follows. The hypothesis that (1) goes to infinity is a largeness condition on ; in view of Mertens’ theorem, it can be viewed as an assertion that is denser than the set of primes. On the other hand, the conclusion that (2) grows is an assertion that becomes significantly larger than on the average for large ; that is to say, that many pairs of numbers in share a common factor. Intuitively, the problem is then asking whether sets that are significantly denser than the primes must start having lots of common factors on average.

Theorem 3 (Second near miss) For large, there exists a subset of of cardinality which contains no parallelograms . Furthermore, this set is in general position: no three points in are collinear, and no four are concyclic. As a consequence, this set in fact avoids the three patterns (the pattern in is concyclic, and the pattern does not occur at all in the grid).

It is then natural to try a random construction, in which one sieves out the natural numbers by permitting each natural number to survive with a probability resembling , in order to get the predicted behavior for . Performing some standard calculations, this construction could ensure (2) bounded with a density a little bit less than the one stated in the main theorem; after optimizing the parameters, I could only get something like

Unfortunately, this problem looked difficult, as the number-theoretic task of counting the patterns in looked quite daunting.

In a previous blog post, I discussed how, from a Bayesian perspective, learning about some new information can update one’s perceived odds about how likely an “alternative hypothesis” is, compared to a “null hypothesis” . The mathematical formula here is Thus, provided one has (i) A precise formulation of the null hypothesis and the alternative hypothesis , and the new information ; (ii) A reasonable estimate of the prior odds of the alternative hypothesis being true (compared to the null hypothesis ); (iii) A reasonable estimate of the probability that the event would occur under the null hypothesis ; and (iv) A reasonable estimate of the probability that the event would occur under the alternative hypothesis , one can then obtain an estimate of the posterior odds of the alternative hypothesis being true after observing the event . Ingredient (iii) is usually the easiest to compute, but is only one of the key inputs required; ingredients (ii) and (iv) are far trickier, involve subjective non-mathematical considerations, and as discussed in the previous post, depend rather crucially on ingredient (i). For instance, selectively reporting some relevant information for , and witholding other key information from , can make the final mathematical calculation misleading with regard to the “true” odds of being true compared to based on all available information, even if the calculation was technically accurate with regards to the partial information provided. Also, the computation of the relative odds of two competing hypotheses and can become somewhat moot if there is a third hypothesis that ends up being more plausible than either of these two hypotheses. Nevertheless, the formula (1) can still lead to useful conclusions, albeit ones that are qualified by the particular assumptions and estimates made in the analysis.

As a model problem, I tried to reproduce this type of result from a more geometric perspective, trying to aim for a more “self-similar” blowup than a “multi-scale” one, in the hope that this latter type of blowup might be more tractable to analyze and eventually resolve Conjecture 1. I didn’t fully succeed; but I think the approach I outline below is in principle feasible.

Let us try to apply the above Bayesian framework to this situation, bearing in mind the caveats that this analysis is only strong as the inputs supplied and assumptions made (for instance, to simplify the discussion, we will not also discuss information from exit polling, which in this case gave significantly different predictions from the percentages above).

I’ve just uploaded to the arXiv my paper “Dense sets of natural numbers with unusually large least common multiples“. This short paper answers (in the negative) a somewhat obscure question of Erdős and Graham:

For sake of comparison, it is easy to see that if (1) goes to infinity, then at least one pair of distinct elements in must have a non-trivial common factor. For if this were not the case, then the elements of are pairwise coprime, so each prime has at most one multiple in , and so can contribute at most to the sum in (1), and hence by Mertens’ theorem, and the fact that every natural number greater than one is divisible by at least one prime , the quantity (1) stays bounded, a contradiction.

It turns out that this problem was essentially solved (though not explicitly) by a recently published paper of Bui, Pratt, and Zaharescu, who studied a closely related quantity introduced by Erdös, Graham, and Selfridge (see also Problem B30 of Guy’s book), defined for any natural number as the least natural number such that some subset of , when multiplied together with , produced a square. Among the several results proven about in that paper was the following:

Updates on my research and expository papers, discussion of open problems, and other maths-related topics. By Terence Tao

Many modern mathematical proofs are a combination of conceptual arguments and technical calculations. There is something of a tradeoff between the two: one can add more conceptual arguments to try to reduce the technical computations, or vice versa. (Among other things, this leads to a Berkson paradox-like phenomenon in which a negative correlation can be observed between the two aspects of a proof; see this recent Mastodon post of mine for more discussion.)

Over the last few weeks, I’ve learned that many of these laws have previously appeared in the literature, and compiled a “tour” of these equations here. For instance, in addition to the very well known commutative law (Equation 43) and associative law (Equation 4512), some equations (Equation 14, Equation 29, Equation 381, Equation 3722, and Equation 3744) appeared in some Putnam math competitions; Equation 168 defines a fascinating structure, known as a “central groupoid”, that was studied in particular by Evans and by Knuth, and was a key inspiration for the Knuth-Bendix completion algorithm; and Equation 1571 classifies abelian groups of exponent two.

The manifold I will work on is a cylinder , where is a smooth compact manifold, and the metric on is just the sum of the standard metric on the first coordinate and :

I am particularly interested in the possibility of using these modern tools to explore a class of many mathematical problems at once, as opposed to the current approach of focusing on only one or two problems at a time. This seems like an inherently modularizable and repetitive task, which could particularly benefit from both crowdsourcing and automated tools, if given the right platform to rigorously coordinate all the contributions; and it is a type of mathematics that previous methods usually could not scale up to (except perhaps over a period of many years, as individual papers slowly explore the class one data point at a time until a reasonable intuition about the class is attained). Among other things, having a large data set of problems to work on could be helpful for benchmarking various automated tools and compare the efficacy of different workflows.

I would imagine that there is some “low-hanging fruit” that could establish a large number of implications (or anti-implications) quite easily. For instance, laws such as Equation2 or Equation3 more or less completely describe the binary operation , and it should be quite easy to check which of the laws are implied by either of these two laws. The poset has a reflection symmetry associated to replacing the binary operator by its reflection , which in principle cuts down the total work by a factor of about two. Specific examples of magmas, such as the natural numbers with the addition operation, obey some set of equations in but not others, and so could be used to generate a large number of anti-implications. Some existing automated proving tools for equational logic, such as Prover9 and Mace4 (for obtaining implications and anti-implications respectively), could then be used to handle most of the remaining “easy” cases (though some work may be needed to convert the outputs of such tools into Lean). The remaining “hard” cases could then be targeted by some combination of human contributors and more advanced AI tools.

The Erdös problem site was created last year, and announced earlier this year on this blog. Every so often, I have taken a look at a random problem from the site for fun. A few times, I was able to make progress on one of the problems, leading to a couple papers; but the more common outcome is that I play around with the problem for a while, see why the problem is difficult, and then eventually give up and do something else. But, as is common in this field, I don’t make public the observations that I made, and the next person who looks at the same problem would likely have to go through the same process of trial and error to work out what the main obstructions that are present are.

Proof: One uses an explicit algebraic construction, going back to an old paper of Erdös and Turán involving constructions of Sidon sets. Namely, one considers the set where is a prime between and (the existence of which is guaranteed by Bertrand’s postulate). Standard Gauss sum estimates can be used to show that has cardinality . If contained four points that were in a parallelogram or on a circle, or three points in a line, then one could lift up from to the finite field plane and conclude that the finite field parabola also contained four points in a parallelogram or a circle, or three points on a line. But straightforward algebraic calculations can be performed to show that none of these scenarios can occur. For instance, if were four points on a parallelogram that were contained in a parabola, this would imply that an alternating sum of the form

Unfortunately, as discussed in this previous blog post, the parity problem blocks such improvements from taking place from most standard analytic number theory methods, in particular sieve theory. A particularly dangerous enemy arises from Siegel zeroes. This is discussed in detail in the papers of of Ford and of Granville mentioned previously, but an informal discussion is as follows. If there is a Siegel zero associated to the quadratic character of some conductor , this roughly speaking means that almost all primes (in certain ranges) will be quadratic non-residues mod . In particular, if one restricts attention to numbers in a residue class that is a quadratic residue, we then expect most numbers in this class to have an even number of prime factors, rather than an odd number.

The arguments were in fact quite elementary, with the main tool being the theory of smooth numbers (the theory of hyperelliptic equations is used elsewhere in the paper, but not for this particular result).

On the other hand, it was pointed out in a more recent article of Granville that (as with the current situation), Siegel zeroes can sometimes serve to enforce the parity barrier, rather than overcome it, and responds to my previous statement with the comment “this claim needs to be treated with caution, since its truth depends on the context”.

After quite hectic amount of back end setup and “putting out fires”, the project is now running fairly smoothly, with activity coordinated on a Lean Zulip channel, and all contributions going through a pull request process on Github, and tracked via an issues-based Github project with the invaluable oversight provided by the other two maintainers of the project, Pietro Monticone and Shreyas Srinivas. In contrast to the prior PFR formalization project, the workflow follows standard Github practices and proceeds roughly as follows: if, during the course of the Zulip discussion, it becomes clear that some specific task needs to be done to move the project forward (e.g., to formalize in Lean the proof of an implication that had been worked out in the discussion threads), an “issue” is made (often by myself or one of the other maintainers), which other contributors can then “claim”, work on separately (using a local copy of the main Github repository), and then submit a “pull request” to merge their contribution back into the main repository. This request can then be reviewed by both maintainers and other contributors, and if approved, closes the relevant issue.

Given that the grid determine only distances, one could seek a counterexample to this by finding a set of points in the grid that avoided all of the eight patterns .

One can then ask which axioms imply which others. For instance, Equation1 implies all the other axioms in this list, which in turn imply Equation11. Equation8 implies Equation9 as a special case, which in turn implies Equation10 as a special case. The full poset of implications can be depicted by the following Hasse diagram:

More subtle are the anti-implications, in which we have to show that a law does not imply a law . In principle, one just has to exhibit a magma that obeys but not . In a large fraction of cases, one can simply search through small finite magmas – such as magmas on two, three, or four elements – to obtain this anti-implication; but they do not always suffice, and in fact we know of anti-implications that can only be proven through a construction of an infinite magma. For instance, the “Asterix law” is now known (from the efforts of this project) to not imply the “Obelix law”, but all counterexamples are necessarily infinite. Curiously, the known constructions have some affinity with the famous technique of forcing in set theory, in that we continually add “generic” elements to a (partial) magma in order to force a counterexample with certain specified properties to exist, though the constructions here are certainly far simpler than in the set-theoretic constructions.

More specifically I would like to propose the following (admittedly artificial) project as a pilot to further test out this paradigm, which was inspired by a MathOverflow question from last year, and discussed somewhat further on my Mastodon account shortly afterwards.

While the project is still ongoing, I can say that I am quite satisfied with the progress accomplished to date, and that many of my hopes for such a project have already been realized. On the scientific side, we have discovered some new techniques and constructions to show that a given equational theory does not imply another one, and have also discovered some exotic algebraic structures, such as the “Asterix” and “Obelix” pair, that have interesting features, and which would likely not have been discovered by any means other than the type of systematic search conducted here. The participants are very diverse, ranging from mathematicians and computer scientists at all stages of career, to interested students and amateurs. The Lean platform has worked well in integrating both human-generated and machine-generated contributions; the latter are numerically by far the largest source of contributions, but many of the automatically generated results were first obtained in special cases by humans, and then generalized and formalized (often by different members of the project). We still make many informal mathematical arguments on the discussion thread, but they tend to be rapidly formalized in Lean, at which point disputes about correctness disappear, and we can instead focus on how best to deploy various verified techniques to tackle the remaining implications.

We have also obtained profitable mileage out of constructions of “linear” magmas in both commutative and non-commutative rings; free magmas associated to “confluent” equational laws, and more generally laws with complete rewriting systems. As such, the number of unresolved implications continues to shrink at a steady pace, although we are not yet ready to declare victory on the project.

Theorem 2 (Finite time blowup for the forced equation) There exists a smooth solution to the forced Euler equations on that exhibits finite time blowup, in which the forcing term stays uniformly bounded in for any .

One can contrast this analysis with that of the Phillipine lottery in the original post. In both cases the probability of the observed event under the null hypothesis was extremely small. However, in the case of the Venezuelan election, there is a plausible causal chain that leads to an elevated probability of the observed event under the alternative hypothesis, whereas in the case of the lottery, only extremely implausible chains could be constructed that would lead to the specific outcome of a multiples-of-9 lottery draw for that specific lottery on that specific date.

Thanks to the Birkhoff completeness theorem, if one equational law implies another, then it can be proven by a finite number of rewrite operations; however the number of rewrites needed could be quite lengthy. The implication of 359 from 1491 mentioned above is already moderately challenging, requiring four or five rewrites; the implication of Equation 2 from Equation 1689 is incredibly long (try it!). Nevertheless, standard automated theorem provers, such as Vampire, are quite capable of proving the vast majority of these implications.

As one can see, it is already somewhat tedious to compute the Hasse diagram of just eleven equations. The project I propose is to try to expand this Hasse diagram by a couple orders of magnitude, covering a significantly larger set of equations. The set I propose is the set of equations that use the magma operation at most four times, up to relabeling and the reflexive and symmetric axioms of equality; this includes the eleven equations above, but also many more. How many more? Recall that the Catalan number is the number of ways one can form an expression out of applications of a binary operation (applied to placeholder variables); and, given a string of placeholder variables, the Bell number is the number of ways (up to relabeling) to assign names to each of these variables, where some of the placeholders are allowed to be assigned the same name. As a consequence, ignoring symmetry, the number of equations that involve at most four operations is

The implications of this refined asymptotic then depend rather crucially on how the Siegel correction term is aligned with the main term, and also whether it is of comparable order or lower order. In many situations (particularly those concerning “average case” problems, in which one wants to understand the behavior for typical choices of parameters), the Siegel correction term ends up being lower order, and so one ends up with the situation described in my initial blog post, where we are able to get the predicted asymptotic in the Siegel zero case. However, as pointed out by Granville, there are other situations (particularly those involving “worst case” problems, in which some key parameter can be chosen adversarially) in which the Siegel correction term can align to completely cancel (or to highly reinforce) the main term. In such cases, the Siegel zero becomes a very concrete manifestation of the parity barrier, rather than a means to avoid it. (There is a tiny chance that there may be some sort of “repulsion” phenomenon in which having no semiprimes in for one value of somehow generates semiprimes in for another value of , which would allow one to solve the problem without having to directly address the Siegel issue, but I don’t see how two such intervals could “communicate” in order to achieve such a repulsion effect.)

The first step (ingredient (i)) is to formulate the null hypothesis , the alternative hypothesis , and the event . Here is one possible choice: is the event that the reported vote total for Maduro, González, and Other are all equal to the nearest integer of the total number of voters, multiplied by a round percentage with one decimal point (i.e., an integer multiple of ). is the null hypothesis that the vote totals were reported accurately (or with only inconsequential inaccuracies). is the alternative hypothesis that the vote totals were manipulated by officials from the incumbent administration.

Remark 3 One can also try to directy create a self-similar blowup to (1), (2), for instance by making the ansatz for and some fields and . This particular ansatz seems consistent with all known conservation laws; however it works out to basically be ten vector equations (plus some additional scalar constraints) on ten vector field unknowns, so is just barely overdetermined. I have not been able to locate a self-similar blowup ansatz that is underdetermined.

It is not clear to me at all what the geometry of will look like. Will most equations be incomparable with each other? Will it stratify into layers of “strong” and “weak” axioms? Will there be a lot of equivalent axioms? It might be interesting to record now any speculations as what the structure of this poset, and compare these predictions with the outcome of the project afterwards.

Thanks to the tireless efforts of many volunteer contributors to the project, we now have a number of nice visualization tools to inspect various portions of the (not quite completed) implication graph. For instance, this graph depicts all the consequences of Equation 1491: , which I have nicknamed the “Obelix law” (and it has a companion, the “Asterix law“, Equation 65: ). And here is a table of all the equational laws we are studying, together with a count of how many laws they imply, or are implied by. These interfaces are also somewhat integrated with Lean: for instance, you can click here to try your hand at showing that the Obelix law implies Equation 359, ; I’ll leave this as a challenge (a four-line proof in Lean is possible).

Next, we prove the upper bound on . Suppose that a sequence has partial products that are squares for some . Then we have a square for all (with the convention ). The key observation (essentially Lemma 3.4 of Bui–Pratt–Zaharescu) is that, for each , one of the following must hold: (i) At least one of the is -smooth. (ii) At least one of the is divisible by for some prime . (iii) . Indeed, suppose that (i) and (ii) are not true, then one of the terms in the sequence is divisible by exactly one copy of for some prime . In order for the product to be a square, another element of the sequence must also be divisible by the same prime; but this implies (iii).

Perhaps, in analogy with formalization projects, we could have a semi-formal “blueprint” evolving in parallel with the formal Lean component of the project. This way, the project could accept human-written proofs by contributors who do not necessarily have any proficiency in Lean, as well as contributions from automated tools (such as the aforementioned Prover9 and Mace4), whose output is in some other format than Lean. The task of converting these semi-formal proofs into Lean could then be done by other humans or automated tools; in particular I imagine modern AI tools could be particularly valuable for this portion of the workflow. I am not quite sure though if existing blueprint software can scale to handle the large number of individual proofs that would be generated by this project; and as this portion would not be formally verified, a significant amount of human moderation might also be needed here, and this also might not scale properly. Perhaps the semi-formal portion of the project could instead be coordinated on a forum such as this blog, in a similar spirit to past Polymath projects.

12 October, 2024 in expository, math.RA | Tags: collaboration, Equational Theory Project, machine assisted proof | by Terence Tao | 7 comments

Theorem 2 (First near miss) If is sufficiently large, then there exists a subset of of cardinality which avoids all of the patterms .

Analytically, this is not a particularly pleasant equation to try to solve; one can substitute the second equation into the first to obtain a single equation

The problem is in the field of universal algebra, and concerns the (medium-scale) exploration of simple equational theories for magmas. A magma is nothing more than a set equipped with a binary operation . Initially, no additional axioms on this operation are imposed, and as such magmas by themselves are somewhat boring objects. Of course, with additional axioms, such as the identity axiom or the associative axiom, one can get more familiar mathematical objects such as groups, semigroups, or monoids. Here we will be interested in (constant-free) equational axioms, which are axioms of equality involving expressions built from the operation and one or more indeterminate variables in . Two familiar examples of such axioms are the commutative axiom

If we now use Greek indices to only denote coordinates in the “vertical” coordinate , the velocity field now becomes , and the Euler equations now split as

The proofs are not particularly long or deep, but I thought I would record here some of the process towards finding them. My first step was to try to simplify the condition that (2) stays bounded. In order to use probabilistic intuition, I first expressed this condition in probabilistic terms as