View on GitHub

Excellent numbers

Compute excellent numbers

download .ZIPdownload .TGZ

Not the phone number website excellentnumbers.com

The project

I started computing some interesting numbers by brute force in Perl. As the numbers got bigger Perl got slower and slower. I optimized the algorithm and that worked for awhile, but that only went so far. For much bigger numbers, I had to drop into C. Once in C, I had to go wide. I've got a good case study for improving the performance of a program.

—brian d foy <[email protected]>

The numbers

The full list is in the excellent.txt file in the repo, but we also had tweet them as they came in (albeit out of sequence). I've constructed a list of over 350 excellent numbers, most of which were not previously discovered. Matthew Arcus blew us away by discovering over 238,000 excellent numbers.

The math

A number $n$ is excellent if you can break its digits into equal length halves $a$ and $b$ such that $b^2 - a^2$ is $n$. I represent $n$ as $ab$ where $a$ and $b$ represent an equal number of digits of $n$ (and not the product of $a$ and $b$).

For example, for $530901$:

$$ \begin{align} 530901 & \rightarrow \overset{a}{530} \quad \overset{b}{901} \\ b^2 - a^2 & = 901^2 - 530^2 \\ & = 811801 - 280900 \\ & = 530901 \end{align} $$

Represent the number $n$ as $ab$ where that is the concatenation of digits in $a$ and $b$ and not the product of them. Thus, $n$ is $a \cdot 10^k + b$, where $k$ is the number of digits in $a$. Given that, an excellent number is one where $b^2 - a^2 = a \cdot 10^k + b$.

An interesting pattern

This pyramid is a pattern of numbers that are excellent:

$$ \begin{align} & 4 \: 8 \\ 3 \: & 4 \: 6 \: 8 \\ 33 \: & 4 \: 66 \: 8 \\ 333 \: & 4 \: 666 \: 8 \\ 3333 \: & 4 \: 6666 \: 8 \\ 33333 \: & 4 \: 66666 \: 8 \\ 333333 \: & 4 \: 666666 \: 8 \\ 3333333 \: & 4 \: 6666666 \: 8 \end{align} $$

These numbers are of the form of a summation of powers of 10 with a little extra added to the $10^0$ term. Let $k$ be the number of decimal digits in $a$ (or $b$):

$$ \begin{align} a & = 3 ( \sum\limits_{i=0}^{k-1} 10^i ) + 1 \\ b & = 6 ( \sum\limits_{i=0}^{k-1} 10^i ) + 2 \end{align} $$

But $b$ is really just double $a$, which you see by inspection even without the fancy symbols:

$$ \begin{align} a & = & 3 ( \sum\limits_{i=0}^{k-1} 10^i ) + 1 \\ b & = 2 ( & 3 ( \sum\limits_{i=0}^{k-1} 10^i ) + 1 \: \: ) \\ b & = 2 a \end{align} $$

Call that summation (with a little extra) $N$:

$$ N = 3 ( \sum\limits_{i=0}^{k-1} 10^i ) + 1 $$

The number $ab$ is a multiple of $N$. Shift the digits of $a$ over $k$ powers of $10$ to give them the right magnitude:

$$ \begin{align} ab & = 10^k N + 2 N \\ & = ( 10^k + 2 ) N \end{align} $$

The term $10^k + 2$ doesn't look that interesting at first, but it's also $9$ repeated $k$ times (with a little extra little extra):

$$ \begin{align} 10^{k} & = 9 ( \sum\limits_{i=0}^{k-1} 10^i ) + 1 \\ 10^{k} + 2 & = 9 ( \sum\limits_{i=0}^{k-1} 10^i ) + 1 + 2 \\ & = 9 ( \sum\limits_{i=0}^{k-1} 10^i ) + 3 \\ & = 3 ( 3 ( \sum\limits_{i=0}^{k-1} 10^i ) + 1 ) \\ & = 3 N \end{align} $$

Put this back into $ab$:

$$ \begin{align} ab & = ( 10^k + 2 ) N\\ & = ( 3 N ) N\\ & = 3 N^2 \end{align} $$

The difference in squares also reduces to $3 N^2$:

$$ \begin{align} b^2 - a^2 & = (2N)^2 - N^2 \\ & = 2^2 N^2 - N^2 \\ & = ( 2^2 - 1 ) N^2 \\ & = 3 N^2 \\ & = ab \end{align} $$

Both sides of $b^2 - a^2 = ab$ are $3 N^2$, so all numbers of this form are excellent. This includes the number $48$, which is the case for $i = 0$. This also means that there is at least one excellent number for every set of numbers of length $2k$ and that there are an infinite number of excellent numbers.

Odd repetitions of an excellent number are excellent

If a number $ab$ is excellent, that number concatenated an odd number of times is also excellent. For example, the number $3468$ is excellent, so $346834683468$ is excellent.

Let $n'$ be a number of the form $abab \ldots ab$ where there are an odd number of repetitions, $r$, of $ab$, where $k$ is the number of decimal digits in $a$. In that case:

$$ \begin{align} abab \ldots ab & = a'b' = ab \sum\limits_{i=0}^{r-1} 10^{2ik} \\ \end{align} $$

Let $a'$ be $ab \ldots aba$ (the first half of the digits) and $b'$ be $ba \ldots bab$ (the second half of the digit). The difference between the squares is then:

$$ \begin{align} b'^2 - a'^2 & = (ba \ldots bab)^2 - (ab \ldots aba)^2 \\ & = ( b \sum\limits_{i=0}^{(r-1)/2} 10^{2ik} + a \sum\limits_{i=0}^{(r-3)/2} 10^{(2i+1)k} )^2 - ( a \sum\limits_{i=0}^{(r-1)/2} 10^{2ik} + b \sum\limits_{i=0}^{(r-3)/2} 10^{(2i+1)k} )^2 \\ \end{align} $$

All those series are annoying, so represent them as $S'$ and $S''$:

$$ \begin{align} S' & = \sum\limits_{i=0}^{(r-1)/2} 10^{2ik} \\ S'' & = \sum\limits_{i=0}^{(r-3)/2} 10^{(2i+1)k} \\ \end{align} $$

This now looks much more tractable. Expand the squares and refactor:

$$ \begin{align} b'^2 - a'^2 & = ( b S' + a S'' )^2 - ( a S' + b S'' )^2 \\ & = ( b^2 S'^2 + 2a \cdot b S' S'' + a^2 S''^2 ) - ( a^2 S'^2 + 2a \cdot b S' S'' + b^2 S''^2 ) \\ & = ( b^2 S'^2 + a^2 S''^2 ) - ( a^2 S'^2 + b^2 S''^2 ) \\ & = ( b^2 S'^2 - b^2 S''^2 ) + ( a^2 S''^2 - a^2 S'^2 ) \\ & = b^2 ( S'^2 - S''^2 ) - a^2 ( S'^2 - S''^2 ) \\ & = ( b^2 - a^2 )( S'^2 - S''^2 ) \\ & = ab ( S'^2 - S''^2 ) \\ \end{align} $$

The trick now is to show that the difference in the squares of these series is the series we started with. Let $r' = (r-1)/2$:

$$ \begin{align} ( S'^2 - S''^2 ) & = (\sum\limits_{i=0}^{(r-1)/2} 10^{2ik} )^2 - (\sum\limits_{i=0}^{(r-3)/2} 10^{(2i+1)k})^2 \\ & = (\sum\limits_{i=0}^{r'} 10^{2ik} )^2 - (\sum\limits_{i=0}^{r'-1} 10^{(2i+1)k})^2 \\ & = \sum\limits_{i=0}^{r'} \sum\limits_{j=0}^{r'} 10^{2ik}10^{2jk} - \sum\limits_{i=0}^{r'-1} \sum\limits_{j=0}^{r'-1} 10^{(2i+1)k} 10^{(2j+1)k} \\ & = \sum\limits_{i=0}^{r'} \sum\limits_{j=0}^{r'} 10^{(i+j)2k} - \sum\limits_{i=0}^{r'-1} \sum\limits_{j=0}^{r'-1} 10^{(i+j+1)2k} \\ \end{align} $$

Let $j' = j + 1$ to start the path to making the second double summation look like the first one. This misses the $j' = 0$ case, but that comes back by adding and subtracting a summation. Do the same for the $i = r'$. This transforms the second double summation to look just like the first one, canceling out both of them:

$$ \begin{align} ( S'^2 - S''^2 ) & = \sum\limits_{i=0}^{r'} \sum\limits_{j=0}^{r'} 10^{(i+j)2k} - \sum\limits_{i=0}^{r'-1} \sum\limits_{j=0}^{r'-1} 10^{(i+j+1)2k} \\ & = \sum\limits_{i=0}^{r'} \sum\limits_{j=0}^{r'} 10^{(i+j)2k} - \sum\limits_{i=0}^{r'-1} \sum\limits_{j'=1}^{r'} 10^{(i+j')2k} \\ & = \sum\limits_{i=0}^{r'} \sum\limits_{j=0}^{r'} 10^{(i+j)2k} - \sum\limits_{i=0}^{r'-1} \sum\limits_{j'=1}^{r'} 10^{(i+j')2k} + \sum\limits_{i=0}^{r'-1} 10^{2ik} - \sum\limits_{i=0}^{r'-1} 10^{2ik} \\ & = \sum\limits_{i=0}^{r'} \sum\limits_{j=0}^{r'} 10^{(i+j)2k} - \sum\limits_{i=0}^{r'-1} \sum\limits_{j'=0}^{r'} 10^{(i+j')2k} + \sum\limits_{i=0}^{r'-1} 10^{2ik} \\ & = \sum\limits_{i=0}^{r'} \sum\limits_{j=0}^{r'} 10^{(i+j)2k} - \sum\limits_{i=0}^{r'-1} \sum\limits_{j'=0}^{r'} 10^{(i+j')2k} + \sum\limits_{i=0}^{r'-1} 10^{2ik} + \sum\limits_{j'=0}^{r'} 10^{(r' + j')2k} - \sum\limits_{j'=0}^{r'} 10^{(r' + j')2k} \\ & = \sum\limits_{i=0}^{r'} \sum\limits_{j=0}^{r'} 10^{(i+j)2k} - \sum\limits_{i=0}^{r'} \sum\limits_{j'=0}^{r'} 10^{(i+j')2k} + \sum\limits_{i=0}^{r'-1} 10^{2ik} + \sum\limits_{j'=0}^{r'} 10^{(r' + j')2k} \\ & = \sum\limits_{i=0}^{r'-1} 10^{2ik} + \sum\limits_{j'=0}^{r'} 10^{(r' + j')2k} \\ & = \sum\limits_{i=0}^{r'-1} 10^{2ik} + \sum\limits_{j'=r'}^{2r'} 10^{j'2k} \\ & = \sum\limits_{i=0}^{r'-1} 10^{2ik} + \sum\limits_{i=r'}^{2r'} 10^{2ik} \\ & = \sum\limits_{i=0}^{2r'} 10^{2ik} \\ & = \sum\limits_{i=0}^{r-1} 10^{2ik} \\ \end{align} $$

Finally, an odd number of repetitions of $ab$ is also excellent:

$$ \begin{align} b'^2 - a'^2 & = ab ( S'^2 - S''^2 ) \\ & = ab \sum\limits_{i=0}^{r-1} 10^{2ik} \\ & = a'b' \end{align} $$

This again shows that if there is an excellent number, there is an infinite number of them. But, this also leads to the next interesting pattern.

A pattern in multiples of six

I've noticed this pattern that popped out after finding the 30-digit number:

 (6)     2      1     6     5      1 3
(12)   3 2    0 1    666    5 0    1 33
(18)  33 2   00 1   66666   5 00   1 333
(24) 333 2  000 1  6666666  5 000  1 3333
(30)3333 2 0000 1 666666666 5 0000 1 33333

These numbers have the form $a = 3_n20_n16_{n+1}$ and $b = 6_n50_n13_{n+1}$ where $n$ signifies a repetition of that digit. In general, I can write these numbers as:

$$ \begin{align} a & = ( 3 (\sum\limits_{i=0}^{n} 10^i) - 1 ) \cdot 10^{2n+2} + 10^{n+1} + 6 (\sum\limits_{i=0}^n 10^i) \\ b & = ( 6 (\sum\limits_{i=0}^{n} 10^i) - 1 ) \cdot 10^{2n+2} + 10^{n+1} + 3 (\sum\limits_{i=0}^n 10^i) \end{align} $$

Define $S$ to stand in for the summations:

$$ S = 3 (\sum\limits_{i=0}^{n} 10^i) $$

I can rewrite the $10^{n+1}$ terms with $S$:

$$ \begin{align} 10^{n+1} & = 9 (\sum\limits_{i=0}^{n} 10^i) + 1 \\ & = 3S + 1 \end{align} $$

Now $a$ and $b$ are polynomials of $S$:

$$ \begin{align} a & = ( S - 1 ) \cdot (3S+1)^2 + (3S+1) + 2 S \\ b & = ( 2 S - 1 ) \cdot (3S+1)^2 + (3S+1) + S \end{align} $$

Tedious expansion and combination of terms shows these numbers are excellent:

$$ \begin{align} b^2 - a^2 & = 81 S^5 (3S+2) \\ a \cdot 10^{3n+3} + b & = a \cdot (3S+1)^3 + b \\ & = 81 S^5 (3S+2) \\ & = b^2 - a^2 \end{align} $$

That $(3S+2)$ is interesting; it's a repunit.

The other numbers have some interesting patterns in their factorizations:

216513: 3 3 3 3 3 3 3 3 3 11
	216: 2 2 2 3 3 3
	513:       3 3 3 19

320166650133: 3 3 3 3 3 3 3 3 3 11 11 11 11 11 101
	320166: 2 3 3 3 7 7 11 11
	650133:   3 3 3     11 11 199

332001666665001333: 3 3 3 3 3 3 3 3 3 3 3 3 3 3 7 11 13 37 37 37 37 37
	332001666: 2 3 3 3 3 3 37 37 499
	665001333:   3 3 3 3 3 37 37     1999

333200016666666500013333: 3 3 3 3 3 3 3 3 3 11 11 11 11 11 73 101 101 101 101 101 137
	333200016666: 2 3 3 3   11 11 101 101      4999
	666500013333:   3 3 3 7 11 11 101 101 2857

The First Algorithm

These numbers are an interesting programming exercise, although most people stop programming before they get to the interesting bits. If you apply brute force and check every number, you can easily do 10- or 12-digit numbers. After that things start to get slow. When their programs get slow, people find something else to do, leaving the good parts for those who stuck at it.

You don't have to check every number. Instead of taking $ab$ and applying some is_excellent() function to it, you can start with $a$ and determine if some $b$ exists that makes $ab$ excellent.

By checking $a$, you eliminate $10^n - 1$ cases. Instead of checking:

$$530 \; 000, 530 \; 001, 530 \; 002, \cdots 530 \; 999$$

you use $530$ to find one value of $b$ that might work.

To do this, you can rearrange the equation. First, represent the single value $b$ as a sum that separates the digits of $a$ from the digits of $b$, where $k$ is the number of digits in $a$:

$$ \begin{align} b^2 - a^2 & = ab \\ & = a 10^k + b \end{align} $$ Collecting terms and rearranging: $$ \begin{align} a^2 - a 10^k & = b^2 - b \\ & = b ( b - 1 ) \end{align} $$

We know the value of $a$ because we chose it, and we know the value of $a^2 + a 10^k$. We have to find the value of $b$ where $b ( b - 1 )$ is the same value. For large enough $b$, $b ( b - 1 )$ is almost the same as $b^2$. We can easily calculate

$$b = \sqrt{a^2 + a 10^k}$$

Furthermore, for $b^2 - a^2$, there's some maximum value of $a$ for which there are no longer a possible value of $b$. The largest $b$ comprises only the digit $9$ repeated $k$ times. If $b$ has four digits, its maximum value is $9999$. There's some value of $a$ past which $9999^2 - a^2 = a 10^k + 9999$ has a solution.

By guessing through bisection, we can find this maximum value, which is a little less than $6.2 \cdot 10^{k}$.

But it gets better. We can immediately discount some values of $a$. First, $a$ must be even and not end in $2$. We know this when we look at the values mod 10:

$$ \begin{align} ( b^2 - a^2 ) \mod 10 & = (a 10^k + b ) \mod 10 \\ b^2 \mod 10 - a^2 \mod 10 & = a 10^k \mod 10 + b \mod 10 \\ b^2 \mod 10 - a^2 \mod 10 & = b \mod 10 \\ b^2 \mod 10 - b \mod 10 & = a^2 \mod 10 \end{align} $$

A square ends in the digits $\{ 0, 1, 4, 5, 6, 9 \}$, so $b^2 \mod 10 - b \mod 10$ must also end with one of those digits because it's the same number. If we take some $b$ and compute $(b^2 - b) mod 10$, only values of $b$ ending in $\{ 0, 1, 3, 5, 6, 8 \}$ produce a value that are in $\{ 0, 1, 4, 5, 6, 9 \}$. (those values being $\{ 0, 6 \}$):

	0*0 - 0 = 0  works
	1*1 - 1 = 0  works
	2*2 - 2 = 2  doesn't work
	3*3 - 3 = 6  works
	4*4 - 4 = 2  doesn't work
	5*5 - 5 = 0  works
	6*6 - 6 = 0  works
	7*7 - 7 = 2  doesn't work
	8*8 - 8 = 6  works
	9*9 - 9 = 2  doesn't work

The only values of $a$ that will produce $\{ 0, 6 \}$ as a square are $\{ 0, 4, 6 \}$. This means that we can skip all a that are $\{ 1, 2, 3, 5, 7, 8, 9 \}$.

From there, it's brute force to check all $a$ ending in $\{ 0, 4, 6 \}$ from $1 \cdot 10^{k-1}$ to $6.2 \cdot 10^{k-1}$.

Brass tacks

Now that we have an algorithm that excludes most of the numbers, we have to compute some large numbers, where large is relative to the integer size of computer processors. Each jump in number of digits, say from 24- to 26-digits, is a ten-fold increase in numbers to check.

The GMP library can do that for us. I first tried to do that with the Math::GMP library for Perl but it was too slow since it had to convert Perl data structures to the C data structures. I eventually dropped down to C completely. That's pretty fast. I can exhaust the 26-digit space in a couple of days on a handful of less-than-decent processors.

But then Sinan Ünür had the idea to try it in 128-bit integer math using compiler support for that size on 64-bit hardware. Instead of using GMP's general purpose high precision stuff, we could make very specific optimizations with none of the stuff we didn't want. This would handle at least up to 36-digit numbers.

This table has timings from my mid-2012 MacBook Air running an 1.8GHz Intel Core i5.

DigitsProcessor time
GMPint128
2 5 ms 4 ms
4 5 ms 4 ms
6 5 ms 6 ms
8 5 ms 6 ms
10 15 ms 6 ms
12 110 ms 7 ms
14 1 s 25 ms
16 11 s 221 ms
18 130 s 2.2 s
20 20 min 22 s
22 3.5 hours 220 s
24 1.5 days 37 min
26 15 days 6.1 hours
28 150 days 2.5 days
30 4 years 21 days
32 210 days
34 6 years
36 60 years

That's just my laptop. The next tactic is to go wide. Instead of one processor, break up the problem and distribute it among several processors. For instance, for 32-digit numbers, it takes 210 days on one processor, or a month on 7 processors or 3 days on 70 processors.

The "Cheating" Algorithm

Matthew Arcus suggested a solution based on quadratic Diophatine equations. The genius of this approach is what mathematics is all about: reducing problems to known problems. First, collect terms: $$ \begin{align} b^2 - a^2 &= a 10^k + b \\ b^2 - b &= a^2 + a 10^k \end{align} $$ Next, you want to move everything around to make some nicer polynomials by completing some squares: $$ \begin{align} b^2 - b &= a^2 + a 10^k \\ 4b^2 - 4b + 1 - 1 &= 4a^2 + 4a 10^k + 10^{2k} - 10^{2k} \\ (2b - 1)^2 - 1 &= (2a + 10^k)^2 - 10^{2k} \\ 10^{2k} - 1 &= (2a + 10^k)^2 - (2b - 1)^2 \\ &= A^2 - B^2 \\ &= (A + B)(A - B) \\ &= i \cdot j \end{align} $$

Let $i$ and $j$ be a pair of divisors whose product is $10^{2k} - 1$, Finding those divisors is an expensive problem for large $k$. But, this is where the cheating comes in. First, some math jokes:

A mathematician, a physicist, and an engineer are asked to find the volumes of rubber balls. The mathematician measures their circumferences to sum their volumes. The physicist submerges them in water and notes their displacement. The engineer finds their serial numbers and looks up their volume.
A physicist, an engineer, and a mathematician compare strategies to solve the flickering lightbulb in the faculty lounge. The physicist says he can predict when it will blow out so they can replace it sooner. The engineer designs a new, longer lasting lightbulb with novel materials. The mathematician uses a broom to break the bulb, saying "Every time I do this a new one shows up".

So far, you've been doing a lot of work, but Matthew's great inspiration is that you already know the factors of $(10^{2k} - 1)/9$. People have already done the work to factor 1, 11, 111, 1111, and so on. These are called repunits (for repetitive units). You can lookup lists of those factors then add two factors of 3 to get to 9, 99, 999, 9999, and so on.

Once you have those factors, you can compute all the divisors of $10^{2k} - 1$. Take one divisor $i$ and compute the corresponding $j$, with $i < j$.

$$ \begin{align} i &= (A - B) \\ j &= (A + B) \\ \end{align} $$

Solve for $A$ and $B$. Since $10^{2k} - 1$ is necessarily odd, its two divisors must also be odd. Their sums and differences are necessarily even:

$$ \begin{align} A &= (j+i) / 2 \\ B &= (j-i) / 2 \\ \end{align} $$

Bring back the definitions of $A$ and $B$:

$$ \begin{align} (2a + 10^k) &= (j+i) / 2 \\ (2b - 1) &= (j-i) / 2 \\ \end{align} $$

Solve for $a$ and $b$:

$$ \begin{align} a &= ( (j+i) / 2 - 10^k ) / 2 \\ b &= ( (j-i) / 2 + 1 ) / 2 \\ \end{align} $$

If both $a$ and $b$ have $k$ digits, the pair is a candidate for an excellent number. Check that they satisfy $b^2 - a^2 = ab$. If they do, you've found an excellent number.

The cheating reduces this problem to a relatively trivial one. What you would do in years of computational work before takes mere seconds. Before you had to check $3 \cdot 10^{k-1}$ candidates and each bump of $k$ required ten times the computational power. With this method, the number of prime factors does not necessarily increase with $k$. The next set of numbers might take less time!

$2k$$10^{2k} - 1$
factors
$10^{2k} - 1$
divisors
$3 \cdot 10^{k-1}$
2 3 7 3
4 4 13 30
6 7 65 300
8 6 49 3000
10 6 49 30,000
12 9 257 300,000
14 6 49 3 million
16 8 193 30 million
18 11 641 300 million
20 9 385 3 billion
22 9 289 30 billion
24 12 2,049 300 billion
26 8 193 3 trillion
28 10 769 30 trillion
30 15 16,385 300 trillion
32 13 6,145 3 quadrillion
34 8 193 30 quadrillion
36 14 5,121 300 quadrillion

All of this is to say that Matthew has completely ruined most of the fun we were having. He's found them all. The timings on each of these runs are so fast they aren't even worth compiling. Most of the time was just the startup of the program. Even on my crappy hardware all times were under a second.

Some conclusions

I started this project to have some fun. I tried some new languages and several new techniques. I read quite a bit about mathematics. I learned quite a bit about high-performance computing.

Had I investigated the problem and come up with Matthew's solution right away, I would have missed all of the exploration. I would have missed the patterns in the numbers because I never would have closely inspected the output looking for patterns that I might feed back into my crude optimizations.

Besides, it was never about the numbers anyway.

Contributions

How to contribute

There are a list of things you can do:

Further reading

I started this as a Perl project, so I've written about this on my Perl blogs. Since then, I've left Perl behind because native C is so much faster.

Watch out for some of the other stuff you find out there. Most other spots have errors. If you see an excellent number that has an odd number of digits, you know they messed up somewhere.