Armchair Architects: Exploring the relationship between Cost and Architecture

Welcome to season two of the Armchair Architects series! We received great feedback – and lots of good questions and discourse – in season 1, so we’re back for another ten posts and videos from now through mid-Summer.

Uli Homann and Eric Charran continue to be our resident experts alongside David Blank-Endelman, host of the Azure Enablement Show. However, this season will feature customer speakers and other activities leading up to the second Backstage Tour virtual event in September.

This post focuses on cost considerations when designing your cloud architecture. Too often we hear about architects or developers striving to design the perfect architecture only to realize it’s too expensive once it’s in production. Moreover, the move to cloud makes it possible to rethink costs at the point of design, given that you can refactor the infrastructure choices, which can be more difficult in on-premises environments (the infrastructure is already bought and paid for). It’s one of the key benefits to OpEx versus CapEx models.

You can also take advantage of modularity, where some application components (or workloads) require higher availability or security than others so you can more easily optimize for some aspects of the application while saving costs for others.

Cloud platforms provide configurable elasticity to grow and shrink infrastructure based on demand. Azure currently has over 70 regions and relationships with partners, like AT&T, who have edge zones that make it possible to scale out to whichever audience or use case your app must support. This flexibility doesn’t come only in the form of actual provisioning, but also in planning the rollout of your app. You can use cost and capacity calculators to determine timing, which could impact the overall cost/benefit structure of your app.

These benefits can be awesome, but you still need to determine what your OpEx expenditures will be ahead of time. You need to work forwards from the design to project operating costs for the solution design. It’s almost like a negotiation: if you want x performance or y reliability, it’s going to cost you z amount.

Scaling-out isn’t instantaneous or automatic

Architects also need to consider the time it takes to scale-out the provisioning of resources. You need to think about the configuration and control planes for each component to hit the temporal milestones related to the predicted load. For instance, if I’m in a seasonal business where my orders might spike around the Super Bowl, then stepwise plan to scale up in time to hit the demand. In today’s world, you still need to look at each component to flip some switches and turn some dials to do that – but do so that limits the risk of prematurely scaling out or in.

Another key consideration is that once you’ve scaled out one component, you may put additional pressure on other components (e.g., a persistent data tier). Considerations include understanding how to scale individual components.  For example, does the database need to scale as well when scaling the compute tier? Or if you scale your overall solution, your monitoring capacity will also need to increase.

The concept of scale units, which describe all the resources you need (i.e.: management, application, compute, storage, etc.) to satisfy demand for a specific capacity help in this regard for planning. You can then take advantage of the dynamic scaling capabilities built into the cloud services by triggering the necessary scaling in the respective tier but in an orchestrated manner.

The scale unit concept means you plan for scaling dependent application components together according to a predefined methodology, rather than attempt to configure them in real time.  This provides specific slices of capacity that can be scaled-out when the situation demands it. It allows you to scale-out the solution symmetrically.

For instance, if you’re in the messaging business (i.e., mail), for 5000 users who need a specific amount of storage, you may need this a symmetric amount of Exchange servers and accompanying monitoring capabilities. Then you take that configuration as a single scale-unit for 5000 and “stamp” it out for however many multiples of 5000 users you have. You could do this for whatever measure of your application.

What do you do about unexpected demand?

So, you might know what your email service will do over time, but what about spiky, unexpected demand – like a flash sale or news event? How do you ensure that it won’t bring down the entire service if these triggers are activated?

Implementing some core architecture patterns will help with this condition.  The throttling pattern applied to your architecture will assist so that it doesn’t become overwhelmed. Depending on your implementation, throttling behavior will safeguard key experiences, while potentially shedding operations.  In the end, depending on your implementation, you may shed some customers, but you won’t bring down your overall application along the way.

You also need to architect in the notion of resiliency, so that if a spike does occur and the service is brought down, the user experience is minimally impacted.

Avoiding geo-redundancy naivety

It’s easy to expect that cloud hyperscaler zone and region redundancy will help applications remain available in the event of a catastrophic failure.  In some instances, the cloud provider will fail over to other zones within the region.  In an escalated catastrophe, either within the application, infrastructure, or datacenter itself may require a failover to other regions entirely.  In these scenarios, architects must be vigilant about how their application will behave under these circumstances and how users will get routed to the closest and next available region.

A prepared standard operating procedure will help for this condition.  For example, all traffic from the impacted region should be rerouted to the next closest (physically) region. Another ideal situation is to have implemented active-active management of the resources, where there is a hot link between them. The second solution is to use Azure Traffic Management or Azure Front Door to effectively manage the load balance. You’re using traffic shaping and traffic management to make sure there is balance between the regions. This allows you to avoid ‘failover’, given that the other regions just take over. It can be more expensive unless you have a global footprint.

But how do you keep the costs down with an active-active scenario? Well, it depends on the SLA and business expectations. If an active-passive scenario is deployed, and the time to bring up the other environment isn’t materially different, then this scenario would make more sense.


So, what’s the takeaway? Cost is now an active ingredient in planning the workload – because it can be. Take advantage of the cloud by planning for elasticity correctly and responsibly the first time.

Look for the next episode in season two of Armchair Architects. We’ll be talking about how to become a successful cloud architect. In the meantime, dive a little deeper into this topic by watching the video below.

Leave a Reply