There might be a few organizations that know exactly how many servers they’ll require to handle their database needs for any given period of time, but for most large enterprises whose success depends upon flows of multiple terabytes of data in often unpredictable surges, this is rarely the case.
Capacity planning is a guessing game, especially when an organization’s lifeblood is real-time, streaming data that enables it to perform real-time analysis to deliver interactions to users. The safest decision is to pay for peak capacity—even during times when far fewer servers are needed.
Why haven’t enterprises benefited from the cloud’s innate elasticity when it comes to database provisioning, and why is it getting even more painful?
In this post, I’ll discuss the challenge of controlling soaring data costs, the hurdles that stood in the way of true consumption-based usage models for cloud-based database services—and how the advent of serverless data can improve the economics for cost-conscious, innovative businesses.
Data costs keep climbing
Data architectures have become increasingly complex as enterprises have needed to build highly responsive applications that can analyze data in real-time and tailor experiences to users (a personalized recommendation based on an item placed in a shopping cart, a smartphone alert that a car tire is low, or a real-time notification of a suspicious transaction in credit card account). With all of these data sources, IDC estimated that more than 59 zettabytes of data would be created, captured, copied, and consumed in 2020.
Rising, non-linear costs have accompanied this volume and complexity—and this has only been exacerbated by the pandemic that has forced many companies to accelerate their digital initiatives.
“Before the COVID-19 crisis, many organizations were projecting the need for more data investment, and the crisis has likely only increased this need. With bottom lines already under pressure from the pandemic’s economic fallout, businesses might wonder where they can find the resources to meet that funding requirement.”
— McKinsey, “Reducing data costs without jeopardizing growth”
In the report cited above, McKinsey projects that spending on data-related initiatives will increase an average of 50% between 2019-2021, when compared to 2016-2018. Data costs can account for the largest part of the budget when it comes to building an application.
What’s the hold up?
The serverless concept isn’t particularly new; it’s commonly associated with AWS Lambda, Amazon’s serverless cloud computing service. While serverless data is relatively new, the basic principle of serverless applies similarly to compute and data.
Serverless is built on next-generation public cloud services that auto-scale and charge only when used. When scale, capacity planning, and cost management are automated, the result is software that’s easier to build and maintain, and significantly cheaper.
But databases have proved challenging to make serverless. Modern databases store data by “partitioning” it across several nodes of a database cluster. Scaling the database requires the addition of more nodes, which, in turn, requires that the data is “rebalanced” across the nodes to keep the ratio of storage and computing capability equal.
As a result, it’s not easy to scale up quickly, and scaling back down is laborious—at best. This makes the job of estimating peak loads just that—guesswork. This can result in enterprises paying for capacity that they don’t use—or hitting a capacity ceiling, which can result in a crashed application and unhappy users.
Decoupling compute and storage
The tight coupling of compute and storage in databases stems from their origin as a tool for transaction processing. Keeping storage and compute “close” to one another reduced latency. Separating them by using network-based storage would have slowed things down dramatically, especially with the storage technologies available in the early days of databases.
But with modern network speeds and the high availability of cloud storage, the latency-driven requirement of keeping compute and storage closely linked isn’t as important as it once was.
With storage that’s connected via the network, and thus decoupled from compute, scaling a database no longer requires rebalancing the data, and scaling up and down becomes easier—and faster. This has opened the door for truly “serverless” database services, where data is stored separately from compute.
The agility to innovate
The ability of serverless data to let workloads dynamically scale up and down to meet the precise needs of an application has led to the rise of some wildly successful companies. Snowflake’s leading position as a cloud-based provider of analytical database services stems in part from how it decoupled compute and storage to offer pay-as-you-go service.
The value-driven pricing model that’s enabled by serverless database services is extremely attractive to cost-conscious enterprises, as they only pay for consumed storage and database operations that are actually executed. Capacity planning can become a thing of the past.
Besides solving scaling challenges, serverless data can simplify several other aspects of running databases. It can improve overall performance by eliminating the need to manually tune databases for particular workload types and weeks (or even months) of waiting for decisions on infrastructure provisioning. Serverless also reduces contention over testing database access, as every developer can have their own. It also reduces the cost of overprovisioned testing environments to replicate production purposes.
In a broader sense, serverless data brings with it a freedom to innovate. Companies can invest in a highly scalable database knowing that they won’t run out of capacity if the small project that their developers are working on takes off and turns into something big.
Serverless data technologies are set to help businesses “bend the curve” on soaring data costs—and create new value from data that might otherwise not have been possible.
Learn more about how DataStax is helping enterprises improve the economics of their data infrastructure with serverless databases.