Codacy Measures Developer Productivity using AWS Serverless

Codacy is a DevOps insights company based in Lisbon, Portugal. Since its launch in 2012, Codacy has helped software development and engineering teams reduce defects, keep technical debt in check, and ship better code, faster.

Codacy’s latest product, Pulse, is a service that helps understand and improve the performance of software engineering teams. This includes measuring metrics such as deployment frequency, lead time for changes, or mean time to recover. Codacy’s main platform is built on top of AWS products like Amazon Elastic Kubernetes Service (EKS), but they have taken Pulse one step further with AWS serverless.

In this post, we will explore the Pulse’s requirements, architecture, and the services it is built on, including AWS Lambda, Amazon API Gateway, and Amazon DynamoDB.

Pulse prototype requirements

Codacy had three clear requirements for their initial Pulse prototype.

  1. The solution must enable the development team to iterate quickly and have minimal time-to-market (TTM) to validate the idea.
  2. The solution must be easily scalable and match the demands of both startups and large enterprises alike. This was of special importance, as Codacy wanted to onboard Pulse with some of their existing customers. At the time, these customers already had massive amounts of information.
  3. The solution must be cost-effective, particularly during the early stages of the product development.

Enter AWS serverless

Codacy could have built Pulse on top of Amazon EC2 instances. However, this brings the undifferentiated heavy lifting of having to provision, secure, and maintain the instances themselves.

AWS serverless technologies are fully managed services that abstract the complexity of infrastructure maintenance away from developers and operators, so they can focus on building products.

Serverless applications also scale elastically and automatically behind the scenes, so customers don’t need to worry about capacity provisioning. Furthermore, these services are highly available by design and span multiple Availability Zones (AZs) within the Region in which they are deployed. This gives customers higher confidence that their systems will continue running even if one Availability Zone is impaired.

AWS serverless technologies are cost-effective too, as they are billed per unit of value, as opposed to billing per provisioned capacity. For example, billing is calculated by the amount of time a function takes to complete or the number of messages published to a queue, rather than how long an EC2 instance runs. Customers only pay when they are getting value out of the services, for example when serving an actual customer request.

Overview of Pulse’s solution architecture

An event is generated when a developer performs a specific action as part of their day-to-day tasks, such as committing code or merging a pull request. These events are the foundational data that Pulse uses to generate insights and are thus processed by multiple Pulse components called modules.

Let’s take a detailed look at a few of them.

Ingestion module

Figure 1. Pulse ingestion module architecture

Figure 1. Pulse ingestion module architecture

Figure 1 shows the ingestion module, which is the entry point of events into the Pulse platform and is built on AWS serverless applications as follows:

  • The ingestion API is exposed to customers using Amazon API Gateway. This defines REST, HTTP, and WebSocket APIs with sophisticated functionality such as request validation, rate limiting, and more.
  • The actual business logic of the API is implemented as AWS Lambda functions. Lambda can run custom code in a fully managed way. You only pay for the time that the function takes to run, in 1-millisecond increments. Lambda natively supports multiple languages, but customers can also bring their own runtimes or container images as needed.
  • API requests are authorized with keys, which are stored in Amazon DynamoDB, a key-value NoSQL database that delivers single-digit millisecond latency at any scale. API Gateway invokes a Lambda function that validates the key against those stored in DynamoDB (this is called a Lambda authorizer.)
  • While API Gateway provides a default domain name for each API, Codacy customizes it with Amazon Route 53, a service that registers domain names and configures DNS records. Route 53 offers a service level agreement (SLA) of 100% availability.
  • Events are stored in raw format in Pulse’s data lake, which is built on top of AWS’ object storage service, Amazon Simple Storage Service (S3). With Amazon S3, you can store massive amounts of information at low cost using simple HTTP requests. The data is highly available and durable.
  • Whenever a new event is ingested by the API, a message is published in Pulse’s message bus. (More information later in this post.)

Events module

Figure 2. Pulse events module architecture

Figure 2. Pulse events module architecture

The events module handles the aggregation and storage of events for actual consumption by customers, see Figure 2:

  • Events are consumed from the message bus and processed with a Lambda function, which stores them in Amazon Redshift.
  • Amazon Redshift is AWS’ managed data warehouse, and enables Pulse’s users to get insights and metrics by running analytical (OLAP) queries with the highest performance.
  • These metrics are exposed to customers via another API (the public API), which is also built on API Gateway.
  • The business logic for this API is implemented using Lambda functions, like the Ingestion module.

Message bus

Figure 3. Message bus architecture

Figure 3. Message bus architecture

We mentioned earlier that Pulse’s modules communicate messages with each other via the “message bus.” When something occurs at a specific component, a message (event) is published to the bus. At the same time, developers create subscriptions for each module that should receive these messages. This is known as the publisher/subscriber pattern (pub/sub for short), and is a fundamental piece of event-driven architectures.

With the message bus, you can decouple all modules from each other. In this way, a publisher does not need to worry about how many or who their subscribers are, or what to do if a new one arrives. This is all handled by the message bus.

Pulse’s message bus is built like this, shown in Figure 3:

  • Events are published via Amazon Simple Notification Service (SNS), using a construct called a topic. Topics are the basic unit of message publication and consumption. Components are subscribed to this topic, and you can filter out unwanted messages.
  • Developers configure Amazon SNS subscriptions to have the events sent to a queue, which provides a buffering layer from which workers can process messages. At the same time, queues also ensure that messages are not lost if there is an error. In Pulse’s case, these queues are implemented with Amazon Simple Queue Service (SQS).

Other modules

There are other parts of Pulse architecture that also use AWS serverless. For example, user authentication and sign-up are handled by Amazon Cognito, and Pulse’s frontend application is hosted on Amazon S3. This app is served to customers worldwide with low latency using Amazon CloudFront, a content delivery network.

Summary and next steps

By using AWS serverless, Codacy has been able to reduce the time required to bring Pulse to market by staying focused on developing business logic, rather than managing servers. Furthermore, Codacy is confident they can handle Pulse’s growth, as this serverless architecture will scale automatically according to demand.

Leave a Reply