Digging through the anals of computer history and system design there is a constant across all systems at all times; the battle for resources. In most cases, this battle comes down to how much RAM I have to work with. In the past, CPU speed and core count frequently came into play. Over the last few decades, processor speed has far outpaced RAM density, so processor speed and the core count have become less of a worry. However, RAM is still a major worry in all system designs, especially database engineering.
With the drive toward storing Petabytes of data in any database system and then querying that data at the fastest possible speed, another contender for resource planning has entered the playing field. If this resource is not properly calculated into the overall design, it can be worse than not having enough RAM or wasting massive amounts of memory while the system waits on writes and reads to complete. This contender is called IOPS.
IOPS is an acronym for Input Output Operations Per Second. IOPS is generally the amount of reading and writing transactions that a system, VM or otherwise, have available to them as they process these massive amounts of data. Some want to think of this only in virtual machines because that is where it is most prevalent since you have to pay higher fees for higher IOPS. But in reality, all systems, large and small, have an IOPS limit.
Suppose we are talking about bare metal servers running any hypervisor. In that case, this comes down to the bandwidth limit of the storage controller and the random access speed of the storage media you are using. In the days of spinning drives, IOPS were very low as you had to calculate the time that it took the spinning drive to move the head to the correct location on the spinning platter and then start reading your data from there. But now we are in the world of SSD and NVME. The speed difference between the spinning technology versus the solid state technology is incomparable. The difference is so huge there is no sense in comparing it.
NVME is fast, the fastest, hands down. But, because we all like numbers, here are some. A spinning drive can average 50 - 180 IOPS. On the other hand, a SATA-based SSD can provide 3,000 - 40,000 IOPS depending on the quality of the drive and the SATA controller. An NVME-based SSD drive can approach half a million IOPS. NVME is becoming the standard, and everything seems happy and fast. But what happens when 10,000 different processes are writing and reading from the same drive array? Things can go sour quickly, which is where vendor-limited IOPS come into play and why purchasing the highest IOPS available is near as expensive as doubling the ram in the same VM.
So, where does IOPS come into play directly with RAM? Well, it is pretty simple. The longer your database has to wait for reads and writes to complete, the longer it ties up the system RAM while it waits for the operation to complete. As your data set grows and your queries become more complex, the system has to spend more time on disk operations. If you saturate the IOPS and it is constantly at 100%, then the queries and data waiting to be ingested have to be held in RAM. As the system is waiting for this to complete and new queries come in they are stored in RAM. Query results are returned to RAM before being returned to the client, so they are also held in RAM, all waiting for a free IOPS slot to do their business.
As more queries come in and more results are sent to RAM, the IOPS continues to be saturated. Then, when it seems things could not get any worse, your RAM is full, users start complaining, and, in the worst-case scenario, your database throws an out-of-memory error, and everything in RAM is lost. The database either goes offline or is in an unstable state. These are all things that we do not want to happen.
The first temptation in this situation is to add more RAM to your database system, especially in the heat of the moment, to get some hot customers off your back. Unfortunately, in a saturated IOPS situation, this is only a bandaid that will buy you a bit of time and cost a lot more money in the end, as RAM, hardware, or virtual is expensive. Extended IOPS are also expensive.
The final message here is; before adding more RAM to fix database problems, ensure your IOPS are not saturated. Most systems have a metric utility that can show your IOPS rate. Get this value and then compare it to the IOPS available to your system and make sure your IO is not saturated before adding more RAM to the system. Doing this could save money and a lot of headaches in the end.Tech Thought