You are currently browsing the tag archive for the ‘Collatz conjecture’ tag.

I have just returned from Basel, Switzerland, on the occasion of the awarding of the 2019 Ostrowski prize to Assaf Naor. I was invited to give the laudatio for Assaf’s work, which I have uploaded here. I also gave a public lecture (intended at the high school student level) at the University of Basel entitled “The Notorious Collatz conjecture”; I have uploaded the slides for that here. (Note that the slides here are somewhat unpolished as I was not initially planning to make them public until I was recently requested to do so. In particular I do not have full attribution for some of the images used in the slides.)

Basel has historically been home to a number of very prominent mathematicians, most notably Jacob Bernoulli, whose headstone I saw at the Basel Minster,

and also Leonhard Euler, for which I could not find a formal memorial, but I did at least see a hotel bearing his name:

Define the *Collatz map* on the natural numbers by setting to equal when is odd and when is even, and let denote the forward Collatz orbit of . The notorious Collatz conjecture asserts that for all . Equivalently, if we define the backwards Collatz orbit to be all the natural numbers that encounter in their forward Collatz orbit, then the Collatz conjecture asserts that . As a partial result towards this latter statement, Krasikov and Lagarias in 2003 established the bound

for all and . (This improved upon previous values of obtained by Applegate and Lagarias in 1995, by Applegate and Lagarias in 1995 by a different method, by Wirsching in 1993, by Krasikov in 1989, by Sander in 1990, and some by Crandall in 1978.) This is still the largest value of for which (1) has been established. Of course, the Collatz conjecture would imply that we can take equal to , which is the assertion that a positive density set of natural numbers obeys the Collatz conjecture. This is not yet established, although the results in my previous paper do at least imply that a positive density set of natural numbers iterates to an (explicitly computable) bounded set, so in principle the case of (1) could now be verified by an (enormous) finite computation in which one verifies that every number in this explicit bounded set iterates to . In this post I would like to record a possible alternate route to this problem that depends on the distribution of a certain family of random variables that appeared in my previous paper, that I called *Syracuse random variables*.

Definition 1 (Syracuse random variables)For any natural number , aSyracuse random variableon the cyclic group is defined as a random variable of the form

where are independent copies of a geometric random variable on the natural numbers with mean , thus

} for . In (2) the arithmetic is performed in the ring .

Thus for instance

and so forth. After reversing the labeling of the , one could also view as the mod reduction of a -adic random variable

The probability density function of the Syracuse random variable can be explicitly computed by a recursive formula (see Lemma 1.12 of my previous paper). For instance, when , is equal to for respectively, while when , is equal to

when respectively.

The relationship of these random variables to the Collatz problem can be explained as follows. Let denote the odd natural numbers, and define the *Syracuse map* by

where the –valuation is the number of times divides . We can define the forward orbit and backward orbit of the Syracuse map as before. It is not difficult to then see that the Collatz conjecture is equivalent to the assertion , and that the assertion (1) for a given is equivalent to the assertion

for all , where is now understood to range over odd natural numbers. A brief calculation then shows that for any odd natural number and natural number , one has

where the natural numbers are defined by the formula

so in particular

Heuristically, one expects the -valuation of a typical odd number to be approximately distributed according to the geometric distribution , so one therefore expects the residue class to be distributed approximately according to the random variable .

The Syracuse random variables will always avoid multiples of three (this reflects the fact that is never a multiple of three), but attains any non-multiple of three in with positive probability. For any natural number , set

Equivalently, is the greatest quantity for which we have the inequality

for all integers not divisible by three, where is the set of all tuples for which

Thus for instance , , and . On the other hand, since all the probabilities sum to as ranges over the non-multiples of , we have the trivial upper bound

There is also an easy submultiplicativity result:

*Proof:* Let be an integer not divisible by , then by (4) we have

If we let denote the set of tuples that can be formed from the tuples in by deleting the final component from each tuple, then we have

with an integer not divisible by three. By definition of and a relabeling, we then have

for all . For such tuples we then have

so that . Since

for each , the claim follows.

From this lemma we see that for some absolute constant . Heuristically, we expect the Syracuse random variables to be somewhat approximately equidistributed amongst the multiples of (in Proposition 1.4 of my previous paper I prove a fine scale mixing result that supports this heuristic). As a consequence it is natural to conjecture that . I cannot prove this, but I can show that this conjecture would imply that we can take the exponent in (1), (3) arbitrarily close to one:

Proposition 3Suppose that (that is to say, as ). Thenas , or equivalently

I prove this proposition below the fold. A variant of the argument shows that for any value of , (1), (3) holds whenever , where is an explicitly computable function with as . In principle, one could then improve the Krasikov-Lagarias result by getting a sufficiently good upper bound on , which is in principle achievable numerically (note for instance that Lemma 2 implies the bound for any , since for any ).

I’ve just uploaded to the arXiv my paper “Almost all Collatz orbits attain almost bounded values“, submitted to the proceedings of the Forum of Mathematics, Pi. In this paper I returned to the topic of the notorious Collatz conjecture (also known as the conjecture), which I previously discussed in this blog post. This conjecture can be phrased as follows. Let denote the positive integers (with the natural numbers), and let be the map defined by setting equal to when is odd and when is even. Let be the minimal element of the Collatz orbit . Then we have

Conjecture 1 (Collatz conjecture)One has for all .

Establishing the conjecture for all remains out of reach of current techniques (for instance, as discussed in the previous blog post, it is basically at least as difficult as Baker’s theorem, all known proofs of which are quite difficult). However, the situation is more promising if one is willing to settle for results which only hold for “most” in some sense. For instance, it is a result of Krasikov and Lagarias that

for all sufficiently large . In another direction, it was shown by Terras that for almost all (in the sense of natural density), one has . This was then improved by Allouche to for almost all and any fixed , and extended later by Korec to cover all . In this paper we obtain the following further improvement (at the cost of weakening natural density to logarithmic density):

Theorem 2Let be any function with . Then we have for almost all (in the sense of logarithmic density).

Thus for instance one has for almost all (in the sense of logarithmic density).

The difficulty here is one usually only expects to establish “local-in-time” results that control the evolution for times that only get as large as a small multiple of ; the aforementioned results of Terras, Allouche, and Korec, for instance, are of this type. However, to get all the way down to one needs something more like an “(almost) global-in-time” result, where the evolution remains under control for so long that the orbit has nearly reached the bounded state .

However, as observed by Bourgain in the context of nonlinear Schrödinger equations, one can iterate “almost sure local wellposedness” type results (which give local control for almost all initial data from a given distribution) into “almost sure (almost) global wellposedness” type results if one is fortunate enough to draw one’s data from an *invariant measure* for the dynamics. To illustrate the idea, let us take Korec’s aforementioned result that if one picks at random an integer from a large interval , then in most cases, the orbit of will eventually move into the interval . Similarly, if one picks an integer at random from , then in most cases, the orbit of will eventually move into . It is then tempting to concatenate the two statements and conclude that for most in , the orbit will eventually move . Unfortunately, this argument does not quite work, because by the time the orbit from a randomly drawn reaches , the distribution of the final value is unlikely to be close to being uniformly distributed on , and in particular could potentially concentrate almost entirely in the exceptional set of that do not make it into . The point here is the uniform measure on is not transported by Collatz dynamics to anything resembling the uniform measure on .

So, one now needs to locate a measure which has better invariance properties under the Collatz dynamics. It turns out to be technically convenient to work with a standard acceleration of the Collatz map known as the *Syracuse map* , defined on the odd numbers by setting , where is the largest power of that divides . (The advantage of using the Syracuse map over the Collatz map is that it performs precisely one multiplication of at each iteration step, which makes the map better behaved when performing “-adic” analysis.)

When viewed -adically, we soon see that iterations of the Syracuse map become somewhat irregular. Most obviously, is never divisible by . A little less obviously, is twice as likely to equal mod as it is to equal mod . This is because for a randomly chosen odd , the number of times that divides can be seen to have a geometric distribution of mean – it equals any given value with probability . Such a geometric random variable is twice as likely to be odd as to be even, which is what gives the above irregularity. There are similar irregularities modulo higher powers of . For instance, one can compute that for large random odd , will take the residue classes with probabilities

respectively. More generally, for any , will be distributed according to the law of a random variable on that we call a *Syracuse random variable*, and can be described explicitly as

where are iid copies of a geometric random variable of mean .

In view of this, any proposed “invariant” (or approximately invariant) measure (or family of measures) for the Syracuse dynamics should take this -adic irregularity of distribution into account. It turns out that one can use the Syracuse random variables to construct such a measure, but only if these random variables stabilise in the limit in a certain total variation sense. More precisely, in the paper we establish the estimate

for any and any . This type of stabilisation is plausible from entropy heuristics – the tuple of geometric random variables that generates has Shannon entropy , which is significantly larger than the total entropy of the uniform distribution on , so we expect a lot of “mixing” and “collision” to occur when converting the tuple to ; these heuristics can be supported by numerics (which I was able to work out up to about before running into memory and CPU issues), but it turns out to be surprisingly delicate to make this precise.

A first hint of how to proceed comes from the elementary number theory observation (easily proven by induction) that the rational numbers

are all distinct as vary over tuples in . Unfortunately, the process of reducing mod creates a lot of collisions (as must happen from the pigeonhole principle); however, by a simple “Lefschetz principle” type argument one can at least show that the reductions

are mostly distinct for “typical” (as drawn using the geometric distribution) as long as is a bit smaller than (basically because the rational number appearing in (3) then typically takes a form like with an integer between and ). This analysis of the component (3) of (1) is already enough to get quite a bit of spreading on (roughly speaking, when the argument is optimised, it shows that this random variable cannot concentrate in any subset of of density less than for some large absolute constant ). To get from this to a stabilisation property (2) we have to exploit the mixing effects of the remaining portion of (1) that does not come from (3). After some standard Fourier-analytic manipulations, matters then boil down to obtaining non-trivial decay of the characteristic function of , and more precisely in showing that

for any and any that is not divisible by .

If the random variable (1) was the sum of independent terms, one could express this characteristic function as something like a Riesz product, which would be straightforward to estimate well. Unfortunately, the terms in (1) are loosely coupled together, and so the characteristic factor does not immediately factor into a Riesz product. However, if one groups adjacent terms in (1) together, one can rewrite it (assuming is even for sake of discussion) as

where . The point here is that after conditioning on the to be fixed, the random variables remain independent (though the distribution of each depends on the value that we conditioned to), and so the above expression is a *conditional* sum of independent random variables. This lets one express the characeteristic function of (1) as an *averaged* Riesz product. One can use this to establish the bound (4) as long as one can show that the expression

is not close to an integer for a moderately large number (, to be precise) of indices . (Actually, for technical reasons we have to also restrict to those for which , but let us ignore this detail here.) To put it another way, if we let denote the set of pairs for which

we have to show that (with overwhelming probability) the random walk

(which we view as a two-dimensional renewal process) contains at least a few points lying outside of .

A little bit of elementary number theory and combinatorics allows one to describe the set as the union of “triangles” with a certain non-zero separation between them. If the triangles were all fairly small, then one expects the renewal process to visit at least one point outside of after passing through any given such triangle, and it then becomes relatively easy to then show that the renewal process usually has the required number of points outside of . The most difficult case is when the renewal process passes through a particularly large triangle in . However, it turns out that large triangles enjoy particularly good separation properties, and in particular afer passing through a large triangle one is likely to only encounter nothing but small triangles for a while. After making these heuristics more precise, one is finally able to get enough points on the renewal process outside of that one can finish the proof of (4), and thus Theorem 2.

One of the most notorious problems in elementary mathematics that remains unsolved is the Collatz conjecture, concerning the function defined by setting when is odd, and when is even. (Here, is understood to be the positive natural numbers .)

Conjecture 1 (Collatz conjecture)For any given natural number , the orbit passes through (i.e. for some ).

Open questions with this level of notoriety can lead to what Richard Lipton calls “mathematical diseases” (and what I termed an unhealthy amount of obsession on a single famous problem). (See also this xkcd comic regarding the Collatz conjecture.) As such, most practicing mathematicians tend to spend the majority of their time on more productive research areas that are only just beyond the range of current techniques. Nevertheless, it can still be diverting to spend a day or two each year on these sorts of questions, before returning to other matters; so I recently had a go at the problem. Needless to say, I didn’t solve the problem, but I have a better appreciation of why the conjecture is (a) plausible, and (b) unlikely be proven by current technology, and I thought I would share what I had found out here on this blog.

Let me begin with some very well known facts. If is odd, then is even, and so . Because of this, one could replace by the function , defined by when is odd, and when is even, and obtain an equivalent conjecture. Now we see that if one chooses “at random”, in the sense that it is odd with probability and even with probability , then increases by a factor of roughly half the time, and decreases it by a factor of half the time. Furthermore, if is uniformly distributed modulo , one easily verifies that is uniformly distributed modulo , and so should be roughly times as large as half the time, and roughly times as large as the other half of the time. Continuing this at a heuristic level, we expect generically that half the time, and the other half of the time. The logarithm of this orbit can then be modeled heuristically by a random walk with steps and occuring with equal probability. The expectation

is negative, and so (by the classic gambler’s ruin) we expect the orbit to decrease over the long term. This can be viewed as heuristic justification of the Collatz conjecture, at least in the “average case” scenario in which is chosen uniform at random (e.g. in some large interval ). (It also suggests that if one modifies the problem, e.g. by replacing to , then one can obtain orbits that tend to increase over time, and indeed numerically for this variant one sees orbits that appear to escape to infinity.) Unfortunately, one can only rigorously keep the orbit uniformly distributed modulo for time about or so; after that, the system is too complicated for naive methods to control at anything other than a heuristic level.

Remark 1One can obtain a rigorous analogue of the above arguments by extending from the integers to the -adics . This compact abelian group comes with a Haar probability measure, and one can verify that this measure is invariant with respect to ; with a bit more effort one can verify that it is ergodic. This suggests the introduction of ergodic theory methods. For instance, using the pointwise ergodic theorem, we see that if is a random-adicinteger, then almost surely the orbit will be even half the time and odd half the time asymptotically, thus supporting the above heuristics. Unfortunately, this does not directly tell us much about the dynamics on , as this is a measure zero subset of . More generally, unless a dynamical system is somehow “polynomial”, “nilpotent”, or “unipotent” in nature, the current state of ergodic theory is usually only able to say something meaningful aboutgenericorbits, but not aboutallorbits. For instance, the very simple system on the unit circle is well understood from ergodic theory (in particular, almost all orbits will be uniformly distributed), but the orbit of a specific point, e.g. , is still nearly impossible to understand (this particular problem being equivalent to the notorious unsolved question of whether the digits of are uniformly distributed).

The above heuristic argument only suggests decreasing orbits for *almost all* (though even this remains unproven, the state of the art is that the number of in that eventually go to is , a result of Krasikov and Lagarias). It leaves open the possibility of some very rare exceptional for which the orbit goes to infinity, or gets trapped in a periodic loop. Since the only loop that lies in is (for ) or (for ), we thus may isolate a weaker consequence of the Collatz conjecture:

Conjecture 2 (Weak Collatz conjecture)Suppose that is a natural number such that for some . Then is equal to , , or .

Of course, we may replace with (and delete ““) and obtain an equivalent conjecture.

This weaker version of the Collatz conjecture is also unproven. However, it was observed by Bohm and Sontacchi that this weak conjecture is equivalent to a divisibility problem involving powers of and :

Conjecture 3 (Reformulated weak Collatz conjecture)There does not exist and integerssuch that is a positive integer that is a proper divisor of

*Proof:* To see this, it is convenient to reformulate Conjecture 2 slightly. Define an equivalence relation on by declaring if for some integer , thus giving rise to the quotient space of equivalence classes (which can be placed, if one wishes, in one-to-one correspondence with the odd natural numbers). We can then define a function by declaring

for any , where is the largest power of that divides . It is easy to see that is well-defined (it is essentially the *Syracuse function*, after identifying with the odd natural numbers), and that periodic orbits of correspond to periodic orbits of or . Thus, Conjecture 2 is equivalent to the conjecture that is the only periodic orbit of .

Now suppose that Conjecture 2 failed, thus there exists such that for some . Without loss of generality we may take to be odd, then . It is easy to see that is the only fixed point of , and so . An easy induction using (2) shows that

where, for each , is the largest power of that divides

In particular, as is odd, . Using the recursion

we see from induction that divides , and thus :

Since , we have

for some integer . Since is divisible by , and is odd, we conclude ; if we rearrange the above equation as (1), then we obtain a counterexample to Conjecture 3.

Conversely, suppose that Conjecture 3 failed. Then we have , integers

and a natural number such that (1) holds. As , we see that the right-hand side of (1) is odd, so is odd also. If we then introduce the natural numbers by the formula (3), then an easy induction using (4) shows that

with the periodic convention for . As the are increasing in (even for ), we see that is the largest power of that divides the right-hand side of (5); as is odd, we conclude that is also the largest power of that divides . We conclude that

and thus is a periodic orbit of . Since is an odd number larger than , this contradicts Conjecture 3.

Call a *counterexample* a tuple that contradicts Conjecture 3, i.e. an integer and an increasing set of integers

such that (1) holds for some . We record a simple bound on such counterexamples, due to Terras and to Garner :

Lemma 5 (Exponent bounds)Let , and suppose that the Collatz conjecture is true for all . Let be a counterexample. Then

*Proof:* The first bound is immediate from the positivity of . To prove the second bound, observe from the proof of Proposition 4 that the counterexample will generate a counterexample to Conjecture 2, i.e. a non-trivial periodic orbit . As the conjecture is true for all , all terms in this orbit must be at least . An inspection of the proof of Proposition 4 reveals that this orbit consists of steps of the form , and steps of the form . As all terms are at least , the former steps can increase magnitude by a multiplicative factor of at most . As the orbit returns to where it started, we conclude that

whence the claim.

The Collatz conjecture has already been verified for many values of (up to at least , according to this web site). Inserting this into the above lemma, one can get lower bounds on . For instance, by methods such as this, it is known that any non-trivial periodic orbit has length at least , as shown in Garner’s paper (and this bound, which uses the much smaller value that was available in 1981, can surely be improved using the most recent computational bounds).

Now we can perform a heuristic count on the number of counterexamples. If we fix and , then , and from basic combinatorics we see that there are different ways to choose the remaining integers

to form a potential counterexample . As a crude heuristic, one expects that for a “random” such choice of integers, the expression (1) has a probability of holding for some integer . (Note that is not divisible by or , and so one does not expect the special structure of the right-hand side of (1) with respect to those moduli to be relevant. There will be some choices of where the right-hand side in (1) is too small to be divisible by , but using the estimates in Lemma 5, one expects this to occur very infrequently.) Thus, the total expected number of solutions for this choice of is

The heuristic number of solutions overall is then expected to be

where, in view of Lemma 5, one should restrict the double summation to the heuristic regime , with the approximation here accurate to many decimal places.

We need a lower bound on . Here, we will use Baker’s theorem (as discussed in this previous post), which among other things gives the lower bound

for some absolute constant . Meanwhile, Stirling’s formula (as discussed in this previous post) combined with the approximation gives

where is the *entropy function*

A brief computation shows that

and so (ignoring all subexponential terms)

which makes the series (6) convergent. (Actually, one does not need the full strength of Lemma 5 here; anything that kept well away from would suffice. In particular, one does not need an enormous value of ; even (say) would be more than sufficient to obtain the heuristic that there are finitely many counterexamples.) Heuristically applying the Borel-Cantelli lemma, we thus expect that there are only a finite number of counterexamples to the weak Collatz conjecture (and inserting a bound such as , one in fact expects it to be extremely likely that there are no counterexamples at all).

This, of course, is far short of any rigorous proof of Conjecture 2. In order to make rigorous progress on this conjecture, it seems that one would need to somehow exploit the structural properties of numbers of the form

In some very special cases, this can be done. For instance, suppose that one had with at most one exception (this is essentially what is called a *-cycle* by Steiner). Then (8) simplifies via the geometric series formula to a combination of just a bounded number of powers of and , rather than an unbounded number. In that case, one can start using tools from transcendence theory such as Baker’s theorem to obtain good results; for instance, in the above-referenced paper of Steiner, it was shown that -cycles cannot actually occur, and similar methods have been used to show that -cycles (in which there are at most exceptions to ) do not occur for any , as was shown by Simons and de Weger. However, for general increasing tuples of integers , there is no such representation by bounded numbers of powers, and it does not seem that methods from transcendence theory will be sufficient to control the expressions (8) to the extent that one can understand their divisibility properties by quantities such as .

Amusingly, there is a slight connection to Littlewood-Offord theory in additive combinatorics – the study of the random sums

generated by some elements of an additive group , or equivalently, the vertices of an -dimensional parallelepiped inside . Here, the relevant group is . The point is that if one fixes and (and hence ), and lets vary inside the simplex

then the set of all sums of the form (8) (viewed as an element of ) contains many large parallelepipeds. (Note, incidentally, that once one fixes , all the sums of the form (8) are distinct; because given (8) and , one can read off as the largest power of that divides (8), and then subtracting off one can then read off , and so forth.) This is because the simplex contains many large cubes. Indeed, if one picks a typical element of , then one expects (thanks to Lemma 5) that there there will be indices such that for , which allows one to adjust each of the independently by if desired and still remain inside . This gives a cube in of dimension , which then induces a parallelepiped of the same dimension in . A short computation shows that the generators of this parallelepiped consist of products of a power of and a power of , and in particular will be coprime to .

If the weak Collatz conjecture is true, then the set must avoid the residue class in . Let us suppose temporarily that we did not know about Baker’s theorem (and the associated bound (7)), so that could potentially be quite small. Then we would have a large parallelepiped inside a small cyclic group that did not cover all of , which would not be possible for small enough. Indeed, an easy induction shows that a -dimensional parallelepiped in , with all generators coprime to , has cardinality at least . This argument already shows the lower bound . In other words, we have

Proposition 6Suppose the weak Collatz conjecture is true. Then for any natural numbers with , one has .

This bound is very weak when compared against the unconditional bound (7). However, I know of no way to get a nontrivial separation property between powers of and powers of other than via transcendence theory methods. Thus, this result strongly suggests that any proof of the Collatz conjecture must either use existing results in transcendence theory, or else must contribute a new method to give non-trivial results in transcendence theory. (This already rules out a lot of possible approaches to solve the Collatz conjecture.)

By using more sophisticated tools in additive combinatorics, one can improve the above proposition (though it is still well short of the transcendence theory bound (7)):

Proposition 7Suppose the weak Collatz conjecture is true. Then for any natural numbers with , one has for some absolute constant .

*Proof:* (Informal sketch only) Suppose not, then we can find with of size . We form the set as before, which contains parallelepipeds in of large dimension that avoid . We can count the number of times occurs in one of these parallelepipeds by a standard Fourier-analytic computation involving Riesz products (see Chapter 7 of my book with Van Vu, or this recent preprint of Maples). Using this Fourier representation, the fact that this parallelepiped avoids (and the fact that ) forces the generators to be concentrated in a *Bohr set*, in that one can find a non-zero frequency such that of the generators lie in the set . However, one can choose the generators to essentially have the structure of a (generalised) geometric progression (up to scaling, it resembles something like for ranging over a generalised arithmetic progression, and a fixed irrational), and one can show that such progressions cannot be concentrated in Bohr sets (this is similar in spirit to the exponential sum estimates of Bourgain on approximate multiplicative subgroups of , though one can use more elementary methods here due to the very strong nature of the Bohr set concentration (being of the “ concentration” variety rather than the “ concentration”).). This furnishes the required contradiction.

Thus we see that any proposed proof of the Collatz conjecture must either use transcendence theory, or introduce new techniques that are powerful enough to create exponential separation between powers of and powers of .

Unfortunately, once one uses the transcendence theory bound (7), the size of the cyclic group becomes larger than the volume of any cube in , and Littlewood-Offord techniques are no longer of much use (they can be used to show that is highly equidistributed in , but this does not directly give any way to prevent from containing ).

One possible toy model problem for the (weak) Collatz conjecture is a conjecture of Erdos asserting that for , the base representation of contains at least one . (See this paper of Lagarias for some work on this conjecture and on related problems.) To put it another way, the conjecture asserts that there are no integer solutions to

with and . (When , of course, one has .) In this form we see a resemblance to Conjecture 3, but it looks like a simpler problem to attack (though one which is still a fair distance beyond what one can do with current technology). Note that one has a similar heuristic support for this conjecture as one does for Proposition 3; a number of magnitude has about base digits, so the heuristic probability that none of these digits are equal to is , which is absolutely summable.

## Recent Comments