## Detecting prime numbers with regex

The suggestions coming from Github Copilot look sometimes like alien technology, particularly when some incomprehensible code actually works. Recently, I stumbled upon this little excerpt that tests if a number is prime (and it actually works):

The ways of the universe are mysterious

Let's dissect the expression to understand it a bit better. First, the number that we are trying to check is converted into a sequence of that same amount of ones with `'1'.repeat(n)`

. Hence, the number `6`

becomes `111111`

. We can already see why this is a fun trivia and not something you should be using in your code (imagine testing for `1e20`

), and why should always inspect the code from Copilot.

This list of ones is tested against the regex, so that if there is some match, the number is not prime. If you're not very used to regular expressions, I suggest learning it with some resource like RegexOne or Regex Golf; it's one of those tools that come in handy regardless of the technology you use, either to test strings or to find and replace stuff quickly. It's combines really well with the multiple cursors from modern IDEs.

The regex `/^1?$|^(11+?)\1+$/`

will then only match non-prime numbers, so let's inspect it. First, it can be split into two expressions separated by a disjunction operator `|`

. The first is `^1?$`

, which will match zero or one, the first two non-prime natural numbers. Then, `^(11+?)\1+$`

, which is where the magic occurs. The first part `(11+?)`

will match a sequence of two or more ones, but in a non-greedy way, so that it will match the smallest possible sequence. The second part `\1+`

will then match the same sequence repeated one or more times.

Since the whole expression is anchored to the beginning and the end of the string using `^`

and `$`

, it will only match strings made of some sequence that is repeated a number of times. And how can a sequence be a repeated a number of times? Well, not being a prime number. For instance, in the case of `6`

, the sequence `11`

is repeated three times, so it matches the expression, because `6`

is the product of `2*3`

.

```
^1?$ # an empty string or a single 1
| # or
^ # start of the string
(11+?) # a sequence of two or more ones
\1+ # repeated one or more times
$ # end of the string
```

The original trick was developed in 1998 by @Abigail, a hacker very involved in the development of Perl, who keeps writing wild regex solutions to problems such as such as completing a sudoku or solving the N-Queens problem to this day. This expression is resurrected every few years, puzzling new generations of programmers. The next time you see one of these AI weird suggestions, if you pause to inspect it and do a bit of code archeology, you might find another piece of programming history.

Related posts: