Buttons in case you want to go back to blogindex or back to front page

Optimizing a prime factorization algorithm

Introduction

I was recently introduced to the Project Euler-problemset. As a friend of problems and a fairly experienced C-programmer, I decided to give them a try.

The third problem, called "Largest prime factor" was about, well, finding the largest prime factor of a given number. This is quite easy to brute-force even for an inexperienced programmer and I came up with a solution of my own in what must've been less than a couple of minutes.

After thinking about it for a while, I decided that I would attempt to further optimize my solution even though it was already enough to solve the given problem with ease. The main motive for this optimization was that I knew that I'd have to rewrite the function anyways for a calculator program that I'm working on together with a friend. The program is called testauskalkki and it's written in Rust.

Due to this and the fact that I can't stand bad (suboptimal) code I started the process of optimizing my algorithm

Technical disclaimer

All tests of the programs included in this article are run on the same machine. The time-values are generated by a program called hyperfine (apart from the first one, since that is way too long for it to make sense to benchmark it with hyperfine.

The tests are run on my T440p and the programs are compiled with GCC using the following command:
gcc -O3 -o output -lm input.c.

The input for every program is: 700851475143

1. Initial algorithm

The code I came up with for initially solving the problem is the following:

#include <stdio.h>

// A very Q&D prime factorization

long long
nextprime(long long n)
{
    for (long long i = n+1; i > 0; ++i) {
        char prime = 1;

        for (long long j = 2; j < i; ++j) {
            if (i % j == 0) {
                prime = 0;
                break;
            }
        }

        if (prime) {
            return i;
        }
    }
    return -1;
}

int
main()
{
    long long n;
    scanf("%lld", &n);

    long long prime = 2;

    while (n > 1) {
        if (n % prime == 0) {
            printf("%lld\n", prime);
            n /= prime;
        } else {
            prime = nextprime(prime);
        }
    }
    return 0;
}

As the comment suggests this algorithm is extremely naive and therefore really slow.

A brief description of how the program operates:

  1. The program reads n from input and assigns prime to 2, the first prime number.
  2. The program starts a loop with the condition n > 1 In this loop the program checks whether n is evenly divisible by prime and if that is the case the program prints out the value of prime as it is a prime factor of the number and set n equal to n/prime
  3. If n isn't divisible by prime the program assings prime to the output of the function nextprime when passed prime as an argument.
  4. The nextprime function operates by looping though every number from its argument n to infinity checking if it's a prime number (by going throuhg every number below it and checking if the number is divisible by any of them)
  5. When the next prime number is found the function returns it. If nothing is found the function returns -1(this is never reached nor handled)

With the given input this program takes approximately 1 hour to compute the answer.

So... As we see, we have a long way to go in optimizing the code

2. The first and biggest optimization

Most of the optimization to the algorithm can be done in the nextprime function. Let's start by splitting the function to two separate functions.

The new function is called isprime and it returns a boolean value: if a the number given as an argument is prime or not.

To this new function we move the inner for-loop of the original function. This itself doesn't optimize the code but it chops it into smaller pieces making it easier to work with.

Now for the optimization:
As some of you might know a number can only be divisible by another number smaller than or equal to it's square root. Therefore in isprime We only need to loop through the numbers from 2 to sqrt(n) which saves a lot of time. Following this optimization the two functions look like this (main stays unchanged and therefore isn't included)

char
isprime(long long n)
{
    if (n == 1) return 0;
    if (n == 2) return 1;
    if (n == 3) return 1;

    int sq = (int)sqrt(n);
    for (int i = 2; i <=; sq; ++i) {
        if (n % i == 0) {
            return 0;
        }
    }
    return 1;
}

long long
nextprime(long long n)
{
    for (long long i = n+1; i > 0; ++i) {
        if (isprime(i)) {
            return i;
        }
    }
    return -1;
}

Additionally #include <math.h> is required for sqrt

This optimization might seem like a small change but it actually saves a lot of time as the runtime goes from 1h to ~3.5s which is >1000x faster!!

BUT WAIT! I'm not done yet!

3. Doubling the performance

The next optimization should be quite an obvious one. We know that 2 is the only even prime number. So we don't need to check for even numbers (apart from 2) This is an easy fix as we can just increment the number by 2 instead of 1

In addition to this we can also make the observation that every even number is divisible by 2 and therefore checking for divisibility with even numbers after making sure the number is not even in isprime is completely redundant.

After implementing this, the relevant parts of the code look like this:

char
isprime(long long n)
{
    if (n == 1) return 0;
    if (n == 2) return 1;
    if (n == 3) return 1;
    if (n % 2 == 0) return 0;

    long long sq = (long long)sqrt(n);

    for (int i = 3; i < sq; i+=2) {
        if (n % i == 0) {
            return 0;
        }
    }
    return 1;
}

long long
nextprime(long long n)
{
    if (n == 2) return 3;

    for (long long i = n+2; i > 0; i+=2) {
        if (isprime(i)) {
            return i;
        }
    }
    return -1;
}

As the title states this halves the amount of checks and therefore doubles the performance making the runtime around 1.7 seconds Some might consider this enough but I'm still not happy and want to break the 1 second barrier

4. Breaking the 1 second barrier

Now that even numbers (=numbers divisible by 2) are dealt with the next logical step is to deal with numbers divisible by 3.

As it so happends, every third odd number is divisible by three. How? you might ask. Consider the following odd numbers: 3 5 7 9 11 13 15. Off these numbers 3, 9 and 15 are divisible by 3 and the rest aren't.

This means that we can skip every third of the numbers we are currently iterating through and that will further improve the efficiency by ~1/3 or 33%.

With that observation, here's the code:

long long
nextprime(long long n)
{
    if (n == 2) return 3;
    char inc = 4;

    for (long long i = n+2; i > 0; i+=inc) {
        if (isprime(i)) {
            return i;
        }
        inc = 6 - inc;
    }
    return -1;
}

With this improvement we finally manage to get a runtime of <1 second! (~950ms to be exact)

This is as far as I went with the optimizations before opting for a different strategy

5. Don't re-invent the wheel

As it seems the optimizations I've been implementing to my algorithm are somewhat similar to how an algorithm called Wheel factorization is implemented.

After reading through the aforementioned Wikipedia-article describing the implementation I came up with my own version of the Wheel factorization algorithm that best-fitted my needs.

In brief the Wheel factorization algorithm essentially combines the two functions+main from my previous solutions and makes them even more efficient. Instead of checking on every number whether It's a prime or not, the algorithm has a Wheel of values that it uses to increment the variable so that it's always a prime number. Then it proceeds in a similar way to my programs main checking whether the prime is a factor of the number.

Here's my implementation of the Wheel algorithm:


#include <stdio.h>
#include <math.h>

int
main()
{
    unsigned long long n;
    scanf("%llud", &n);

    char inc[] = { 1, 2, 2, 4, 2, 4, 2, 4, 6, 2, 6 };
    unsigned long long k = 2;
    char i = 0;

    while (k*k <= n) {
        if (n % k == 0) {
            printf("%lld\n", k);
            n /= k;
        } else {
            k += inc[i];

            if (i < 10) {
                ++i;
            } else {
                i = 3;
            }
        }
    }

    if (n > 1) {
        printf("%lld\n", n);
    }
    return 0;
}

With this code, I'm able to compute the prime factors of the input number in less than a millisecond which is crazy fast!

Conclusion

I deem this experiment as a succesful one because during this experiment did I not only get to optimize some code but I also learned a lot of stuff about prime numbers that I wasn't familiar with before. Overall I see writing good code as a very positive experience even though it might be difficult to come up with clever stuff like this. Luckily for us there are a bunch of pre-existing algorithms designed by people much more clever than myself. Learning to implement these algorithms can be very rewarding and might just help us understand the idea behind them.

I'm sure the code can still be improved somehow, but my current solution is what I'd consider good enough. If you can come up with some further improvements, please email them to me and I might update them to this article.

Thank you for reading this article! Feel free to use the code featured in this article to whatever you please :)