Optimizing your AWS Infrastructure for Sustainability, Part I: Compute

As organizations align their business with sustainable practices, it is important to review every functional area. If you’re building, deploying, and maintaining an IT stack, improving its environmental impact requires informed decision making.

Source: Optimizing your AWS Infrastructure for Sustainability, Part I: Compute

This three-part blog series provides strategies to optimize your AWS architecture within compute, storage, and networking.

In Part I, we provide success criteria and metrics for IT executives and practical guidance for architects/developers on how to adjust the compute layer of your AWS infrastructure for sustainability. To help you achieve this goal, we provide key metrics, show commonly used services and tools, provide insights into analyzing them with Amazon CloudWatch, and provide steps for improving them.

Our commitment to sustainable practices

At AWS, we are committed to running our business in the most environmentally friendly way possible. We also work to enable our customers to use the benefits of the cloud to better monitor and optimize their IT infrastructure. As reported in The Carbon Reduction Opportunity of Moving to Amazon Web Services, our infrastructure is 3.6 times more energy efficient than the median US enterprise data center, and moving to AWS can lower your workload’s carbon footprint by 88% for the same task.

That said, sustainability is a shared responsibility between AWS and our customers. As shown in Figure 1, we optimize for sustainability of the cloud, while customers are responsible for sustainability in the cloud, meaning they must optimize their workloads and resource utilization.

Figure 1. Shared responsibility model for sustainability

To reduce the amount of energy spent by your workload, you must use your resources efficiently. “The greenest energy is the energy we don’t use” as Peter DeSantis, VP of AWS Global Infrastructure, has said. But when we have to, using the fewest number of resources possible, and using these to the fullest will result in the lowest impact on the environment. Translating this to your architectural decisions means that the more efficient your architecture is, the more sustainable it will be.

We acknowledge that every AWS architecture is different and there is no one size fits all solution. Therefore, keep in mind that you should assess first if the proposed recommendations adhere to your specific requirements.

Optimizing the compute layer of your AWS infrastructure

Compute services make up the foundation of many customers’ workloads, which brings great potential for optimization. When you look at the energy proportionality of hardware (the ratio between consumed power and utilization), an idle server still consumes power, as shown in Original Postubs/archive/33387.pdf" target="_blank">The Case for Energy-Proportional Computing. Thus, you can improve the efficiency of your workload by using the fewest number of compute resources and achieving a high utilization. Implementing the recommendations in the following sections will help you use resources more efficiently and save costs.

Reducing idle resources and maximizing utilization

Energy-Proportional Computing: A New Definition reports that compute resources should be maximized, on average 70-80%. This is because energy efficiency quickly decreases as performance drops. The following table shows you the most widely used compute services, the metrics they measure, and their associated user guides.

ServiceMetricSource
Amazon Elastic Compute Cloud (Amazon EC2)CPUUtilizationList of the available CloudWatch metrics for your instances
Total Number of vCPUsMonitor metrics with CloudWatch
Amazon Elastic Container Service (Amazon ECS)CPUUtilizationAvailable metrics and dimensions
Amazon EMRIsIdleMonitor metrics with Cloudwatch

You can monitor these metrics with the architecture shown in Figure 2. CloudWatch provides a unified view of your resource metrics.

SaleBestseller No. 1
Acer Aspire 3 A315-24P-R7VH Slim Laptop | 15.6" Full HD IPS Display | AMD Ryzen 3 7320U Quad-Core Processor | AMD Radeon Graphics | 8GB LPDDR5 | 128GB NVMe SSD | Wi-Fi 6 | Windows 11 Home in S Mode
  • Purposeful Design: Travel with ease and look great...
  • Ready-to-Go Performance: The Aspire 3 is...
  • Visibly Stunning: Experience sharp details and...
  • Internal Specifications: 8GB LPDDR5 Onboard...
  • The HD front-facing camera uses Acer’s TNR...
Bestseller No. 2
HP Newest 14" Ultral Light Laptop for Students and Business, Intel Quad-Core N4120, 8GB RAM, 192GB Storage(64GB eMMC+128GB Micro SD), 1 Year Office 365, Webcam, HDMI, WiFi, USB-A&C, Win 11 S
  • 【14" HD Display】14.0-inch diagonal, HD (1366 x...
  • 【Processor & Graphics】Intel Celeron N4120, 4...
  • 【RAM & Storage】8GB high-bandwidth DDR4 Memory...
  • 【Ports】1 x USB 3.1 Type-C ports, 2 x USB 3.1...
  • 【Windows 11 Home in S mode】You may switch to...

Figure 2. CloudWatch monitors your compute resources

With an AWS Cost & Usage Report, you can understand your resource usage. AWS Usage Queries, as shown in Figure 3, is a sample solution that provides an AWS Cloud Development Kit (AWS CDK) template to create, store, query, and visualize your AWS Cost & Usage Report.

Figure 3. AWS Cost & Usage Report for monitoring your compute resources

We recommend the following services and tools to reduce idle resources and maximize utilization.

  • Integrate Amazon EC2 Auto Scaling. Setting up Amazon EC2 Auto Scaling allows your workload to automatically scale up and down based on demand.
  • Right-size resources with AWS Cost Explorer and AWS Graviton2. When you decide on an instance type, you should consider the requirements of your workload.
    • Consider using T instances, which come with burstable performance, if your workload has several unpredictable spikes. This reduces the need to over-provision capacity.
    • Use AWS Cost Explorer to see right-sizing recommendations for your workload. It highlights opportunities to reduce cost and improve resource efficiency by downsizing instances where possible.
    • Improve the power efficiency of your compute workload by switching to Graviton2-based instances. Graviton2 is our most power-efficient processor. It delivers 2-3.5 times better CPU performance per watt than any other processor in AWS. Additionally, Graviton2 provides up to 40% better price performance over comparable current generation x86-based instances for various workloads.
  • Adopt a serverless, event-driven architecture. Consider adopting a serverless, event-driven architecture to maximize overall resource utilization. Serverless architecture removes the need for you to run and maintain physical servers, because it is abstracted by AWS services. Because the cost of serverless architectures generally correlates with the level of usage, your workload’s cost efficiency will improve.
    • For asynchronous workloads, use an event-driven architecture so that compute is only used as needed and not in an idle state while waiting.

Shape demand to existing supply

In contrast to matching supply to demand through methods such as automatic scaling, you can shape demand to match existing supply. This strategy makes sense especially for flexible workloads where the exact time to run a certain operation is not relevant. This could include a recurring scheduled task that runs overnight or a workload that can be interrupted.

We recommend the following services and tools to shape demand to existing supply:

  • Use Amazon EC2 Spot Instances. Spot Instances take advantage of unused EC2 capacity in the AWS Cloud. By shaping your demand to the existing supply of EC2 instance capacity, you will improve your overall resource efficiency and reduce idle capacity.
    • Spot Instances save you up to 90% in cost compared to On-Demand Instances.
    • Use Spot Instances for fault-tolerant, flexible, and stateless workloads that can be interrupted.
  • Apply jitter to scheduled tasks. Avoid load spikes by applying jitter to scheduled tasks and shift time-flexible workloads.
    • Assess if your scheduled tasks can be distributed to run at a random time during the hour.
    • Avoid using 0 as the start minute of scheduled tasks. This is the common default value. Rather, use a number between 2 and 58 to define the start minute for scheduled tasks.

Conclusion

New
Naclud Laptops, 15 Inch Laptop, Laptop Computer with 128GB ROM 4GB RAM, Intel N4000 Processor(Up to 2.6GHz), 2.4G/5G WiFi, BT5.0, Type C, USB3.2, Mini-HDMI, 53200mWh Long Battery Life
  • EFFICIENT PERFORMANCE: Equipped with 4GB...
  • Powerful configuration: Equipped with the Intel...
  • LIGHTWEIGHT AND ADVANCED - The slim case weighs...
  • Multifunctional interface: fast connection with...
  • Worry-free customer service: from date of...
New
HP - Victus 15.6" Full HD 144Hz Gaming Laptop - Intel Core i5-13420H - 8GB Memory - NVIDIA GeForce RTX 3050-512GB SSD - Performance Blue (Renewed)
  • Powered by an Intel Core i5 13th Gen 13420H 1.5GHz...
  • Equipped with an NVIDIA GeForce RTX 3050 6GB GDDR6...
  • Includes 8GB of DDR4-3200 RAM for smooth...
  • Features a spacious 512GB Solid State Drive for...
  • Boasts a vibrant 15.6" FHD IPS Micro-Edge...
New
HP EliteBook 850 G8 15.6" FHD Laptop Computer – Intel Core i5-11th Gen. up to 4.40GHz – 16GB DDR4 RAM – 512GB NVMe SSD – USB C – Thunderbolt – Webcam – Windows 11 Pro – 3 Yr Warranty – Notebook PC
  • Processor - Powered by 11 Gen i5-1145G7 Processor...
  • Memory and Storage - Equipped with 16GB of...
  • FHD Display - 15.6 inch (1920 x 1080) FHD display,...
  • FEATURES - Intel Iris Xe Graphics – Audio by...
  • Convenience & Warranty: 2 x Thunderbolt 4 with...

In this blog post, we discussed key metrics and recommended actions you can take to optimize your AWS infrastructure for resource efficiency. This will in turn improve the sustainability of your compute resources.

As your business grows, it is natural that metrics like “Total vCPUs hours” will increase. This is why it is important to track these metrics per unit of work, such as number of vCPUs per 100 users or transactions. This way, the KPIs you measure are independent of your business growth.

In the next part of this blog post series, we show you how to optimize the storage part of your IT infrastructure for sustainability in the cloud!