My project looked at the risks of refusing vaccination. One of the things I came to appreciate was the importance of using efficient calculation methods in computer-based mathematics. I wanted to produce a heatmap showing the probability of becoming infected with a particular disease vs. the average number of people each person transmits to and the number of cases coming into a community from outside. My original calculation method took 5 minutes, but using better methods allowed me to reduce this to 1.7 seconds.
Computers are incredibly fast. It may not seem this way when you are waiting for your PC to turn on, but computers are able to do an incredibly huge number of calculations very quickly.
This was one of the things I realised while completing my project on the risks of vaccine refusal. In one particular script I had to perform 567 000 calculations. Running this script took just over 7 seconds on my Surface 3.
However, as my scripts became larger, I started to realise that calculation speeds would be an issue. I wanted to make a heatmap showing the probability of catching a disease versus the effective reproduction number, Re (the average number of people someone transmits to after becoming sick) and importation rate, p, (the proportion of people in a community that catch a disease from some external source). I wanted to make a 100 by 100-pixel map, meaning that I needed to calculate 10 000 probabilities.
To perform this calculation, I used the susceptibility set, which is effectively all the people who will, either directly or indirectly, transmit the disease to someone if they become sick. Let’s say there are m people in someone’s susceptibility set. Then, the probability that he avoids infection would be the probability that none of these m people import the disease, or〖(1-p)〗^m, an easy calculation.
But the size of the susceptibility set is unknown. We did know that the number of people who would transmit directly to someone followed a particular, effective reproduction number-dependent probability distribution, so I used a previously developed method (Miller, 2018) to calculate the distribution of sizes of the susceptibility set. Using these two sets of probabilities, I could calculate the total probability of infection.
Unfortunately, calculating the susceptibility set size distribution was computationally heavy, especially when you did it 10 000 times. The calculation time varied depending on the accuracy I wanted, but a test I conducted took over 5 minutes. (I could run less accurate calculations in about 30 seconds, but this was not ideal.)
The first breakthrough came when I realised that I could estimate the probability of avoiding infection, which I called s, by starting with an initial “estimate” of s, called s_0, of 1 – p, and then repeatedly using the formula s_n=(1-p)e^(R_e (s_(n-1)-1))* to improve my estimate until it was good enough. This made a massive difference to my calculation times, cutting them down to 3.8 seconds in my test, so that I could make a higher resolution, 200 by 200-point heatmap.
But then I realised something else. Since the iteration converged to , I could say that s=(1-p)e^(R_e (s-1)). I couldn’t solve this equation directly, but I could use Newton’s method, an even faster technique, to estimate it. This cut calculation times down to 1.7 seconds, allowing me to make a 300 by 300-point heatmap in about 16 seconds. I’m sure there are still faster ways to perform this calculation, but I think this is good enough for now.
*If you are not very familiar with mathematics, you might not know of e. It is a constant often found in exponentials and is approximately equal to 2.71828.
Reference
Miller, J.C., 2018. A primer on the use of probability generating functions in infectious disease modeling. Infectious Disease Modelling, 3, pp.192-248.
Daniel Roberts
La Trobe University
