I will be doing a series of posts on optimization, namely software optimization for speed, in the coming weeks and months. Optimization is by far my favourite thing in programming. To me, there’s nothing quite like taking code that works and making it code that works faster. I don’t get to do it as often as I’d like, and the same will probably go for you, whether you want to or not, but I will share what knowledge I’ve garnered over the years doing this.
Step 0: Pre-preparation
(Of course this is 0-indexed…) The very first step, and please pay attention, is that YOU MUST NOT, UNDER ANY CIRCUMSTANCE, BEGIN WITH THE OPTIMIZATION. That is: FIX THE BUGS FIRST. This is critical, optimization is very often a feature, or at the very least a defect correction, but more often than not an improvement. If you speed up code that doesn’t work, it will only crash faster. There is such a thing as preemptive optimization which can lead to more trouble than worth it down the road, and it’s a documented pitfall. Consider yourself warned.
(It goes without saying that standard programming practices must be applied, such as having backups and using a form of version control for your code. You must set that up if it’s not, for your own sanity as well as for the needs of the work ahead.)
Step 1: Preparation
With a stable codebase and no (known) bugs, we can begin. Optimization (and this is one of the reasons that makes it so attractive to me) is like experimental science. In science experiments, there is a very important notion that applies here. Scientists call it control. For example, for pharmaceuticals, there is usually a “control group” which is given a pill that does nothing, a placebo, along with other groups which will be given the actual drugs being tested. In the end, every subject took “a pill”, and that’s the scientist’s basis for analysis and comparison. This is also taught in high school science: don’t contaminate your samples, which is another way to say “don’t introduce unknowns.” It’s the same thing with software, but we’ll usually talk about benchmarking.
What you need to do is have all the conditions required in order to run your benchmark, in such a way that you can get back to it easily. That could mean working with backup databases, test files, virtual machines with snapshots, etc. If your optimization requires something more “exotic”, say simulating a faint wifi signal, near capacity memory or CPU usage, etc, you need to be able to bring up those same conditions, so that when you test your changes, you’re actually only testing your changes. In all cases, a reference base is needed, an initial state of input. So your first step (well, after fixing the bugs), is to BE ABLE TO EASILY GO BACK TO YOUR BENCHMARK FOR REFERENCE AND COMPARISON. This also implies being able to compare the end result, not just for speed but for accuracy. Because optimization is about finding a quicker route from point A to point B, not ending up at point C.
Got that? Don’t come back until you have that.