23 April 2026
Introduction
Our first foray into a coding puzzle begins with the first problem you'll see when you head on to Project Euler (and, incidentally, has the easiest puzzle rating of 0). The problem statement is set out as follows:
"Multiples of 3 or 5:
If we list all the natural numbers below 10 that are multiples of 3 or 5, we get 3, 5, 6, and 9. The sum of these multiples is 23.
Find the sum of all the multiples of 3 or 5 below 1000."
Conceptually, the solution seems quite straightforward; iterate through a sequence of numbers and add up those that are divisible by either 3 or 5. Thinking about this for a bit, it seems like it's easy enough to iterate through each number in the range, and then check which numbers are divisible by 3 or 5. So let's try this solution.
1. Naive solution: Iterate through all numbers in the range
Something like this should get the job done:
# 1. naive loop through sequence approach
sum_list = []
q = 10
sq = range(1, q)
for i in sq:
if (i % 3 == 0) | (i % 5 == 0):
sum_list.append(i)
else:
pass
pd.Series(sum_list).sum()
>> np.int64(23)
We first check our result for the example of 10, which indeed we see is what we expect (23). We now try this over the range from 1 to 999. Other than aiming to get the solution right, let's time how long this process takes, with the aim of improving this time in mind.
pd.Series(sum_list).sum()
>> np.int64(233168) # Answer
>> datetime.timedelta(microseconds=542)
So we arrive at the correct answer (233168) which is encouraging (you can check the answer here). Quick enough for the range from 1 to 999 (at 0.005 of a second!). But, what if we wanted to do this over the range 1 to 999 999? Or any other super large number, like 250M? Our solution here may take an impractical amount of time to find the solution over a very large range. Let's find out for ourselves, let's see how long it takes to find the solution for the range 1 to 500M:
pd.Series(sum_list).sum()
>> np.int64(58333332916666668) # Answer
>> datetime.timedelta(seconds=67, microseconds=875617)
So this took ~68 seconds to process. Still not a lengthy period of time, but, the obvious problem here is that our computation time scales with the input range: the greater the input, the longer the time taken to complete the process. Let's think about how we can trim this down.
2. Faster solution: Vectorise
We could try and eliminate the need to iterate through our range. In this solution, we create the range over which we wish to apply our process, and then apply a mask (or filter) to those rows that suit our filter, and then sum these numbers up. Something like below:
# 2. vectorised approach
q = 500_000_000
sq = range(1, q)
vec_t1 = datetime.datetime.now()
df = pd.DataFrame({"sq": sq})
mask = (df["sq"] % 3 == 0) | (df["sq"] % 5 == 0)
vec_result = df.loc[mask, "sq"].sum()
vec_t2 = datetime.datetime.now()
print(vec_result); print(vec_t2 - vec_t1);
>> np.int64(58333332916666668)
>> datetime.timedelta(seconds=5, microseconds=791858)
We arrive at the same answer, thankfully, but we've managed to get to the answer in ~6 seconds, which is about 11 times faster than our iterative approach! Nevertheless, whilst the time taken to process this solution is minimal, it still suffers from the same problem the iterative solution does: the time to find the solution scales with the input. An ideal solution to this problem would be one that doesn't scale with the input.
3. Fastest solution: Arithmetic Series
To achieve this, we turn to a mathematical shortcut: the arithmetic series. This is just a sequence of numbers whose difference between consecutive terms is constant. For example, 3, 6, 9, and so on. This series increments by 3, and thus, each member of the series is divisible by 3.
Impressively, there is a formula that allows us to sum the members (numbers) of the arithmetic series (with a defined start and end point) without having to iterate through all members of that series. That equation takes the following format:
If we simply take the series incrementing by 3, starting at 3 and ending at 9, we can quickly see the sum of this series is 18. Let's put it through our equation to see if we get the same number:
which indeed is 18!
What this now means is that we can use this equation to sum up all the numbers divisible by 3 and 5 across our range. We simply add up the members in the series incrementing by 3, from 1 to 500M, and then do the same for those for the series incrementing by 5!
Ah, but there's a small catch: if we sum all numbers in the 3 arithmetic series, and 5, we will double count the multiples of 3 and 5! For example, in a series incrementing in 3 from 0 - 20, we get 3, 6, 9, 12, 15, 18. In a series incrementing in 5 across the same range, we get 5, 10, 15, 20. 15 is counted twice! Thus, a simple fix is to remove 15 once (as it will be counted in either the series of 3 or 5). Luckily, we can use the same equation to identify multiples of 3 and 5 along our range. So, our angle of attack now looks like this:
where 3n, 5n, and 15n are the sum of the terms in the arithmetic series incrementing by 3, 5, and 15 respectively, and both going across our specified range.
Let's first check that our logic is sound and this works as expected. The following code implements this solution, but up to 999, as the original question asked:
# 3. closed form:
# sn = n/2 * (a_1 + a_n)
# n = how many terms (??): max/multiple = eg 9/3 = 3.
# a_1 = first term (of the multiples) 3/5
# a_n = last_term (max: 9/99)
import datetime
import math
# settings
q = 500_000_000
m = datetime.datetime.now()
upper_limit = q - 1
# multiples of 3:
multiple = 3
n = upper_limit // multiple
a_1 = multiple
a_n = (upper_limit // multiple) * multiple
s_n_n1 = n * (a_1 + a_n) // 2
s_n_n1
# multiples of 5
multiple = 5
n = upper_limit // multiple
a_1 = multiple
a_n = (upper_limit // multiple) * multiple
s_n_n2 = n * (a_1 + a_n) // 2
s_n_n2
# multiples of 15 (overlap)
multiple = 15
n = upper_limit // multiple
a_1 = multiple
a_n = (upper_limit // multiple) * multiple
s_n_n3 = n * (a_1 + a_n) // 2
s_n_n3
# 3n + 5n - 15n
a_series = s_n_n1 + s_n_n2 - s_n_n3
x = datetime.datetime.now()
a_series
>> 233168
I skipped a few steps in explaining the logic in the above code, and some detail and nuance around arithmetic series, and how it's derived. If that's of interest, look here.
OK, excellent, we arrive at the same number as the other approaches (and the correct number!). Let's now try this for a sequence of 1 to 500M and see how the time compares:
>> 58333332916666668
>> datetime.timedelta(microseconds=263)
Fantastic -- we get an answer that's consistent with other approaches, but, now we get the answer in 263 microseconds ( = 0.000263 seconds). The table below illustrates the relative performance of each approach.
It's difficult to overstate the speed of the Arithmetic series approach: in this simple example it is over 250 000 times faster than the naive iterative approach! And the beauty of the arithmetic series is that the time taken is generally constant, no matter the range of the inputs: for example, let's perform the same exercise but over the range 1 to 1 billion:
>> 233333333166666668
>> datetime.timedelta(microseconds=237)
And still we see the processing time is essentially unchanged as compared to when executing over the range of 1 to 500M.
Intuitively, Range iteration and the Vectorised solution first create ranges over which some logic is then applied. Because of this, the computation time has to scale with the inputs. We use a specific notation to describe this relationship between input size and performance, called Big O notation. Range-based solutions (like 1 and 2) are known as O(n) computation: the amount of computation scales linearly with the input. In contrast, our Arithmetic series approach does not scale with input, we define it as O(1) -- constant time -- whether you are calculating the sum for the first 1,000 numbers or the first 1,000,000,000,000, the time it takes to compute remains exactly the same. Others have explained this concept in detail, and I've found this video a good introduction to the topic.
4. Why should you care?
If you are a business facing individual, this may seem like a peripheral exercise, far removed from the product you manage. But the downstream consequences of the implemented analytical approach can be significant.
Put yourself in the shoes of a Chief Marketing Officer, who has recently launched a massive nationwide loyalty program. To keep customers engaged, you've set up a reward: every 3rd purchase is 10% off until some cap is reached. As the manager, you have a strict daily budget for this. You need to monitor your liability in real-time, but as a national retailer, you're processing 50 000 to 100 000 transactions per minute.
If your system uses a brute-force approach (Range iteration and Vectorisation methods above), you're basically looping through every transaction and counting every 3rd one. Approaching the problem in this way will inevitably lead to a lag in processing, and by the time the method finishes "counting" the last ten minutes of sales, you may have already overshot your daiy budget by hundreds of thousands of Rands.
The Arithmetic Series approach turns this into a zero-lag operation, allowing you to instantly size up the rewards and make decisions in real-time.
If you've found this useful or interesting, please consider subscribing to be notified via email when I publish articles to this site. I publish long-form, in-depth articles on ML, AI, and Data Science every two weeks or so, as well as practical business-oriented coding problems, just like this one.