The banker’s database
This is not from me, it’s from a teacher of mine. I thank him to this day for this wonderful image.
Most database classes will teach you how to store data for a few simple models (a store with an inventory, sales, and customers, for example) and through that many will believe in a purely relational model. 100% theoretical. Even when doing actual database work, first database designs will be very much purely relational. And changing database engines is only the best occasion to re-achieve that perfect model!
Let us consider the database for a bank. Let’s simplify it to the bare bones with two tables: customers and operations. Getting a customer’s balance is easy:
SELECT SUM(TransactionAmount) WHERE CustomerID=%CustomerID%;
Our theoretical model is so pure that deposits are positive values and withdrawals (and interest) are negative, which is really what those are. And after all, when you do your next transaction, it can just be recalculated from that. Simplicity incarnate. If you need to output the latest transactions and the balance at each step, it’s still fairly easy to do either individual SUM queries up to each point, or for the really fancy, use a running sum function (MySQL has a nice one).
Then comes our eldest member, old man Smith. He was there since the bank opened back in 1931 (obviously at some point since they moved to computers). And old man Smith has had the good habit of withdrawing only what he needed when he needed, for the past 80 years. He didn’t waste a dime, and often deposited what little loose change he had in his pockets. Dozens of transactions a week. Let’s say 2 dozens per week.
2 * 12 transacions * 52 weeks/years * 80 years = 99,840 entries
How big is our bank? Oh say perhaps a few thousand customers. Hypothetically, let’s make them a round thousand and just as old and active. That gives us a potential of 99,840,000 to sift through.
And every time he makes a deposit or withdrawal, that sum is ran for his balance. All the time. Think of it this way: the first year of transactions (1252) will give the same result for the next 79 other years. The first year will be calculated again 98,588 times.
Draw your conclusions, but the answer is quite simple: deformalize. Give each transaction an up-to-date balance, and keep the latest balance for each customer. A little bit less formal, a lot more efficient. Database optimization 101.