[PYTHON] How to find Prime Numbers. Fast. Feat. Cython.

There is only one thing that I love more than algorithms: Improving algorithms! This article will not explain one of those sophisticated prime solving formulas — but it will show you, how thoughtful thinking and tinkering will help you with optimising your code.

Nicky Reinert
6 min readDec 11, 2023

Nota bene: Do you need the JavaScript version? Check it out here!

What’s a prime number? All of you know that: It’s a number with only two divider natural divisors: 1 and the number itself.

And when you ask someone for a prime number algorithm, you get two answers: A mathematican will throw out a complex formula, like Mill’s formula:

Wait, what? Ok. The other answer is just a simple loop that checks if there is another number, lower than the number in question, where the operation modulo returns 0. The so called “trial division”. In other words: The number in question divided by the other number returns a third number and all numbers are natural, not factorized.

def isPrime(number):
for i in range(2, number):

if (number % i) == 0:
return False

return True

Let’s run this loop to find primes up to 50.000:

%%time

count = 0

for number in range(2, 50000):
if isPrime1(number):
count += 1

print(f'found {count} primes')

On my machine this will take 6 seconds and find 5.133 primes (your results may differ on a different machine):

So, if you are not that much into number theory, what options do we have to improve this algorithm, with some simple tricks?

Let’s start with some obvious things: We don’t need to test all numbers in our “test division”. In fact we can skip every divisor lower than “number / 2”. Why is that? Because if the divisor is greater than “number / 2” the result must be lower than 2 and a faction.

def isPrime(number):
limit = int(number / 2)
for i in range(2, limit + 1):

if (number % i) == 0:
return False

return True

count = 0

for number in range(2, 50000):
if isPrime(number):
count += 1

print(f'found {count} primes')

Actually, this halves the caluclation time to 3 seconds!

But there’s even more in this limit declaration: We can reduce it to the square root (sqrt) of our target number. Because if one divisor is greater than sqrt(number) the other divisor must be lower than sqrt(divisor). Meaning: They have already been checked. Let’s do it:

%%time
import math
def isPrime(number):
limit = int(math.sqrt(number))
for i in range(2, limit + 1):

if (number % i) == 0:
return False

return True

Wooohoo — this leads to an impressive reduction to some milliseconds:

What’s next?

Why do we check even numbers at all? It’s “natural law” that an even number cannot be a prime (except the magic number 2).

We can safely skip them in our main loop. But not only there, we can also safely skip them in the “test division”. An even factor always leads to an even product in a multiplication. This leads to the assumption, that an even number always leads to a modulo unequal 0 when thrown against a prime!

In Python there is a simple trick to identify even or odd numbers: We can just check the last digit:

str(integer)[-1]

Let’s add this logic to both loops:

%%time
def isPrime(number):
limit = int(math.sqrt(number))
for i in range(2, limit + 1):

if str(i)[-1] in [0,2,4,5,6,8]:
continue

if (number % i) == 0:
return False

return True

for number in range(2, 50000):
if str(number)[-1] in [0,2,4,5,6,8]:
continue
if isPrime(number):
count += 1

print(f'found {count} primes')

Something strange happend:

The processing time just increased tremendously. Why that? We should implement the last test a little smarter. Right now we are using some string operations and apparently they are slow as heck.

Python allows us to change the step rate of the range() function. This way we could just skip even numbers in the loop itself. Way easier. Heads up: As we will start at 3 now, we increment the counter integer beforehand:

%%time
def isPrime(number):
limit = int(math.sqrt(number))
for i in range(3, limit + 1, 2):

if (number % i) == 0:
return False

return True

count = 1
for number in range(3, 50000, 2):

if isPrime(number):
count += 1

print(f'found {count} primes')

Waaaay faster — We are back on track. This solution halves the best result from above.

But wait… have you checked the JavaScript version of this script? Our script is quick bu JavaSript is above and beyond. It found all primes up to 100 Mio. in around 40 seconds… The Python script took 10 Minutes when I interrupted it — with no result. This is devastating, so let’s get back the crown.

There is a way to make Python scripts faster, not by improving the algorithm but the way of running the script. Python is a versatile, high-level language, probably easy to learn and with a huge community. But it’s slow because of several facts, one of them is the “dynamic variable typing”. The quick explanation: A variable can be a string, integer or float. Python doesn’t care. Thats why, for example, a division always returns a float, if you not use the double slash (//):

A float is much more complex than an integer. An integer is just a couple of activated bits, while a float… ok, I don’t want to fall into that rabbit hole. It’s complex and therefore slow.

The answer is “Cython”, bascially also a programming language, but with more C-like features. That’s why it’s call Cython. The trick is now to change our Python code to make it work with Cython.

First make sure that you have the Cython module installed:

pip install cython

This is not too difficult, we just declare our variables upfront:

def isPrime5(int number):
cdef int limit # our limit is always an integer
limit = <int>math.sqrt(number) # the result is always an integer
cdef int i # the loop variable is always an integer
for i in range(3, limit + 1, 2):
if (number % i) == 0:
return False
return True

# additionally, we organize the code a little
def countPrimes():
# and again some upfront declarations
cdef int count = 1
cdef int number
cdef float start = time.time()

for number in range(3, 100000000, 2):
if isPrime5(number):
count += 1

print(f'found {count} primes after {time.time()- start} seconds')

if __name__ == "__main__":
countPrimes()

Name this file primesC.pyx and safe it.

Now we need an additional file called “setup.py” that tells Cython how our script works:

from setuptools import setup
from Cython.Build import cythonize

setup(
ext_modules = cythonize("primesC.pyx")
)

Now you compile your script using this command:

python setup.py build_ext --inplace

This will create a (binary) file named primesC.cpython-311-darwin.so which is bascially just a Python module. You can now quickly reference it on the command line like this (or use it in your Python code):

python -c "import primesC; primesC.count_primes()"

And there you go: Instead of waiting Ten minutes, the script now returns a result after almost 79 seconds:

Well, this is still way slower than JavaScript (40 seconds), but it also improves the initial script tremendously.

Mandatory Preview Image

--

--

Nicky Reinert
Nicky Reinert

Written by Nicky Reinert

generalist with many interests, developer, photograph, author, gamer

No responses yet