DR for Azure Data Platform – Architecture

Use case definition

To support this worked example, the fictitious firm “Contoso” will be used with an Azure Data Platform based upon Microsoft Reference Architectures.

Data Service – Component View

Contoso has implemented the following foundational Azure structure, which is a subset of the Enterprise Landing Zone.

The numbers in the following descriptions correspond to the preceding diagram above.

Contoso’s Azure Foundations – Workflow

  1. Enterprise Enrollment – Contoso’s top parent enterprise enrollment within Azure reflecting its commercial agreement with Microsoft, its organizational account structure and available Azure subscriptions. It provides the billing foundation for subscriptions and how the digital estate is administered.
  2. Identity and Access Management – The components required to provide identity, authentication, resource access and authorization services across Contoso’s Azure footprint.
  3. Management Group and Subscription Organization – A scalable group hierarchy aligned to the data platform’s core capabilities, allowing operationalization at scale using centrally managed security and governance where workloads have clear separation. Management groups provide a governance scope above subscriptions.
  4. Management Subscription – A dedicated subscription for the various management level functions of required to support the data platform.
  5. Connectivity Subscription – A dedicated subscription for the connectivity functions of the data platform enabling it to identify named services, determine secure routing and communication across and between internal and external services.
  6. Landing Zone Subscription – One-to-many subscriptions for Azure native, online applications, internal and external facing workloads and resources.
  7. DevOps Platform – The DevOps Platform that supports the Azure foundation & Data Platform. This platform contains the code base source control repository and CI/CD pipelines enabling automated deployments of IaC.

Note

Many customers still retain a large IaaS footprint. To provide recovery capabilities across IaaS, the key component to be added is Azure Site recovery. Site Recovery will orchestrate and automate the replication of Azure VMs between regions, on-premises virtual machines and physical servers to Azure, and on-premises machines to a secondary datacenter.

Within this foundational structure, Contoso has implemented the following elements to support its enterprise business intelligence needs: Contoso's data platform

Contoso’s Data Platform – Workflow

The workflow is read left to right, following the flow of data:

  • Data Sources – The sources or types of data that the data platform can consume from
  • Ingest – The Platform’s capability to ingest data from various sources of varying structure and speed. This design reflects a Lambda architecture
  • Store – The capability to securely store data at scale that has been ingested onto the platform
  • Process – The Platform’s capability to process data, making it “fit for purpose” for downstream processes like cleansing, standardizing and modeling. The pre-processing of data typically ensures that it's in a “position and a condition, ready for use”
  • Enrich – The capability to enhance data processed on the platform via statistical, Machine Learning or other modeling techniques or prebuilt Azure AI Services
  • Serve – The Platform’s capability to shape and present data for downstream consumption
  • Data Consumers – The individuals, applications or downstream processes that consume data from the platforms’ various serving touchpoints
  • Discover and Govern – The Platform’s capabilities to govern the data it contains and ensure it's indexed, discoverable/searchable, well-described, with full lineage and is transparent to its end users and consuming processes.
  • Platform – The foundation upon which the platform is built, that is, Contoso’s Azure Foundations as described above.

Note

For many customers, the conceptual level of the Data Platform reference architecture used will align, but the physical implementation may vary. For example, ELT (extract, load, transform) processes may be performed through Azure Data Factory, and data modeling by Azure SQL server. To address this concern, the Stateless vs Stateful section below will provide guidance.

For the Data Platform, Contoso has selected the lowest recommended production service tiers for all components and has chosen to adopt a “Redeploy on Disaster” DR strategy based upon an operating cost-minimization approach.

SaleBestseller No. 1
HP Elite Desktop PC Computer Intel Core i5 3.1-GHz, 8 gb Ram, 1 TB Hard Drive, DVDRW, 19 Inch LCD Monitor, Keyboard, Mouse, Wireless WiFi, Windows 10 (Renewed)
  • This Certified Refurbished product is tested and...
  • HP Elite 6200 Small Form Factor Desktop PC, Intel...
  • Includes: USB Keyboard & Mouse, WiFi Adapter,...
  • Ports: USB 2.0, DisplayPort, VGA, PS/2 keyboard,...
  • Operating System: Windows 10 64 Bit –...
SaleBestseller No. 2
HP 2022 Newest All-in-One Desktop, 21.5" FHD Display, Intel Celeron J4025 Processor, 16GB RAM, 512GB PCIe SSD, Webcam, HDMI, RJ-45, Wired Keyboard&Mouse, WiFi, Windows 11 Home, White
  • 【High Speed RAM And Enormous Space】16GB DDR4...
  • 【Processor】Intel Celeron J4025 processor (2...
  • 【Display】21.5" diagonal FHD VA ZBD anti-glare...
  • 【Tech Specs】2 x SuperSpeed USB Type-A 5Gbps...
  • 【Authorized KKE Mousepad】Include KKE Mousepad

The following sections will provide a baseline understanding of the DR process and levers available to customers to uplift this posture.

Azure service and component view

The following tables present a breakdown of each Azure service and component used across the Contoso – Data platform, with options for DR uplift.

Note

The sections below are organized by stateful vs stateless services

Stateful Foundational Components

Stateless Foundational Components

  • Subscriptions
    • Component Recovery Responsibility: Microsoft
    • Workload/Configuration Recovery Responsibility: Microsoft
    • Contoso SKU selection: N/A
    • DR Uplift options: N/A, Covered as part of the Azure Service
  • Management Groups
    • Component Recovery Responsibility: Microsoft
    • Workload/Configuration Recovery Responsibility: Microsoft
    • Contoso SKU selection: N/A
    • DR Uplift options: N/A, Covered as part of the Azure Service
  • Azure Monitor
    • Component Recovery Responsibility: Microsoft
    • Workload/Configuration Recovery Responsibility: Microsoft
    • Contoso SKU selection: N/A
    • DR Uplift options: N/A, Covered as part of the Azure Service
  • Cost Management
    • Component Recovery Responsibility: Microsoft
    • Workload/Configuration Recovery Responsibility: Microsoft
    • Contoso SKU selection: N/A
    • DR Uplift options: N/A, Covered as part of the Azure Service
  • Microsoft Defender for Cloud
    • Component Recovery Responsibility: Microsoft
    • Workload/Configuration Recovery Responsibility: Microsoft
    • Contoso SKU selection: N/A
    • DR Uplift options: N/A, Covered as part of the Azure Service
    • Notes
      • Azure Security Center is now Microsoft Defender for Cloud
  • Azure DNS
    • Component Recovery Responsibility: Microsoft
    • Workload/Configuration Recovery Responsibility: Microsoft
    • Contoso SKU selection: Single Zone – Public
    • DR Uplift options: N/A, DNS is highly available by design
  • Network Watcher
    • Component Recovery Responsibility: Microsoft
    • Workload/Configuration Recovery Responsibility: Microsoft
    • Contoso SKU selection: N/A
    • DR Uplift options: N/A, Covered as part of the Azure Service
  • Virtual Networks, including Subnets, UDR & NSGs
    • Component Recovery Responsibility: Contoso
    • Workload/Configuration Recovery Responsibility: Contoso
    • Contoso SKU selection: N/A
    • DR Uplift options: VNETs can be replicated into the secondary, paired region
  • Azure Firewall
    • Component Recovery Responsibility: Contoso
    • Workload/Configuration Recovery Responsibility: Contoso
    • Contoso SKU selection: Standard
    • DR Uplift options: Azure Firewall is highly available by design and can be created with Availability Zones for increased availability
  • Azure DDoS
    • Component Recovery Responsibility: Microsoft
    • Workload/Configuration Recovery Responsibility: Contoso
    • Contoso SKU selection: DDoS Network Protection
    • DR Uplift options: N/A, covered as part of the Azure service
  • ExpressRoute Circuit
    • Component Recovery Responsibility: Contoso, connectivity partner and Microsoft
    • Workload/Configuration Recovery Responsibility: Connectivity partner and Microsoft
    • Contoso SKU selection: Standard
    • DR Uplift options:
    • Notes
      • The ExpressRoute has inbuilt redundancy, with each circuit consisting of two connections to two Microsoft Enterprise edge routers (MSEEs) at an ExpressRoute Location from the connectivity provider/client's network edge
      • ExpressRoute premium circuit will enable access to all Azure regions globally
  • VPN Gateway
    • Component Recovery Responsibility: Contoso
    • Workload/Configuration Recovery Responsibility: Contoso
    • Contoso SKU selection: Single Zone – VpnGw1
    • DR Uplift options: A VPN Gateway can be deployed into an Availability Zone with the VpnGw#AZ SKUs to provide a zone redundant service
  • Azure Load Balancer
    • Component Recovery Responsibility: Contoso
    • Workload/Configuration Recovery Responsibility: Contoso
    • Contoso SKU selection: Standard
    • DR Uplift options:
    • Notes
      • Azure Traffic Manager is a DNS-based traffic load balancer. This service supports the distribution of traffic for public-facing applications across the global Azure regions. This solution will provide protection from a regional outage within a high availability design

Stateful Data platform-specific services

  • Storage Account: Azure Data Lake Gen2
    • Component Recovery Responsibility: Microsoft
    • Workload/Configuration Recovery Responsibility: Contoso
    • Contoso SKU selection: LRS
    • DR Uplift options: Storage Accounts have a broad range of data redundancy options from primary region redundancy up to secondary region redundancy
    • Notes
      • GRS is recommended to uplift redundancy, providing a copy of the data in the paired region
  • Azure Event Hubs
    • Component Recovery Responsibility: Microsoft
    • Workload/Configuration Recovery Responsibility: Contoso
    • Contoso SKU selection: Standard
    • DR Uplift options: An event hub namespace can be created with availability zones enabled. This resiliency can be extended to cover a full region outage with Geo-disaster recovery
    • Notes
      • By design, Event Hubs geo-disaster recovery doesn't replicate data, therefore there are several considerations to keep in mind for failover and fallback
  • Azure IoT Hubs
    • Component Recovery Responsibility: Microsoft
    • Workload/Configuration Recovery Responsibility: Contoso
    • Contoso SKU selection: Standard
    • DR Uplift options:
    • Notes
      • IoT Hub provides Microsoft-Initiated Failover and Manual Failover by replicating data to the paired region for each IoT hub
      • IoT Hub provides Intra-Region HA and will automatically use an availability zone if created in a predefined set of Azure regions
  • Azure Stream Analytics
    • Component Recovery Responsibility: Microsoft
    • Workload/Configuration Recovery Responsibility: Contoso
    • Contoso SKU selection: Standard
    • DR Uplift options: While Azure Stream Analytics is a fully managed PaaS offering, it doesn't provide automatic geo-failover. Geo-redundancy can be achieved by deploying identical Stream Analytics jobs in multiple Azure regions
  • Azure Machine Learning
    • Component Recovery Responsibility: Contoso and Microsoft
    • Workload/Configuration Recovery Responsibility: Contoso
    • Contoso SKU selection: General Purpose, D Series instances
    • DR Uplift options:
    • Notes:
  • Azure Synapse: Data Explorer Pools
    • Component Recovery Responsibility: Microsoft
    • Workload/Configuration Recovery Responsibility: Contoso
    • Contoso SKU selection: Compute Optimized Gen2
    • DR Uplift options:
    • Notes
  • Power BI
    • Component Recovery Responsibility: Microsoft
    • Workload/Configuration Recovery Responsibility: Microsoft
    • Contoso SKU selection: Power BI Pro
    • DR Uplift options: N/A, Power BI’s resiliency is part of its SaaS offering
    • Notes
  • Azure Cosmos DB
    • Component Recovery Responsibility: Microsoft
    • Workload/Configuration Recovery Responsibility: Microsoft
    • Contoso SKU selection: Single Region Write with Periodic backup
    • DR Uplift options:
      • Single-region accounts may lose availability following a regional outage. Resiliency can be uplifted to a single write region and at least a second (read) region and enable Service-Managed failover
      • It's recommended that Azure Cosmos accounts used for production workloads to enable automatic failover. In the absence of this configuration, the account will experience loss of write availability for all the duration of the write region outage, as manual failover won't succeed due to lack of region connectivity
    • Notes
  • Azure Data Share
  • Component Recovery Responsibility: Microsoft
    • Workload/Configuration Recovery Responsibility: Microsoft
    • Contoso SKU selection: N/A
    • DR Uplift options: Azure Data Share’s resiliency can be uplifted by HA deployment into a secondary region
  • Microsoft Purview
    • Component Recovery Responsibility: Microsoft
    • Workload/Configuration Recovery Responsibility: Contoso
    • Contoso SKU selection: N/A
    • DR Uplift options: N/A
    • Notes

Stateless Data platform-specific services

Stateful vs Stateless Components

The speed of innovation across the Microsoft product suite and Azure, in particular, means the component set that we've used for this worked example will quickly evolve. To future-proof against providing stale guidance and extend this guidance to components not explicitly covered in this document, the section below provides some instruction based upon the coarse-grain classification of state.

A component/service can be described as stateful if It's designed to remember preceding events or user interactions. Stateless means there's no record of previous interactions, and each interaction request has to be handled based entirely on information that comes with it.

For a DR scenario that calls for redeployment:

  • Components/services that are “stateless”, like Azure Functions and Azure Data Factory pipelines, can be redeployed from source control with at least a smoke test to validate availability before being introduced into the broader system
  • Components/services that are “stateful”, like Azure SQL database and storage accounts, require more attention
    • When procuring the component, a key decision will be the selection of the data redundancy feature. This decision typically focuses on a trade-off between availability and durability with operating costs
  • Datastore components will also need a data backup strategy. The data redundancy functionality of the underlying storage mitigates this risk for some designs, while others, like SQL databases will need a separate backup process.
    • If necessary, the component can be redeployed from source control with a smoke-test to validate that It's available with the correct configuration
    • A redeployed component containing a dataset must have its dataset rehydrated. Rehydration can be accomplished through data redundancy (when available) or a backup dataset. When rehydration has been completed, it must be validated for completeness.
      • Depending on the nature of the backup process, the backup datasets may require validation before being applied. Backup process corruption/error may result in earlier backup being used in place of the latest available
    • Any delta between component date/timestamp and the current date should be addressed by re-executing or replaying the data ingestion processes from that point forward
    • Once the component dataset is up to date, the component can be introduced into the broader system

Other key services

New
HP Stream 14 inch Laptop for Student and Business, Intel Quad-Core Processor, 16GB RAM, 64GB eMMC, 1-Year Office 365, Webcam, 12H Long Battery Life, Lightweight & Slim Laptop, Wi-Fi, Win 11 H in S
  • 【Processor】Intel Celeron N4120, 4 Cores & 4...
  • 【Display】14.0-inch diagonal, HD (1366 x 768),...
  • 【Storage】16GB high-bandwidth DDR4 Memory (2400...
  • 【Connectivity】1 x USB 3.1 Type-C ports, 2 x...
  • 【System】Windows 11 Home in S mode operating...
New
HAJAAN SuperX Gaming PC | Liquid Cooled | GeForce RTX 4060 8GB | AMD Ryzen 5 5600G | 32GB DDR4 | 1TB SSD | Windows 11 Pro | WiFi | Bluetooth - Black
  • Configured with AMD Ryzen 5 5600G Processor and...
  • 8GB GeForce RTX 4060 GDDR6 dedicated graphics card...
  • Liquid cooling system keeps internal components at...
  • Integrated PCIE Wi-Fi provides excellent wireless...
  • Includes USB Gaming RGB Mechanical Keyboard, Mouse...
New
Lenovo 2023 IdeaPad 1i Essential Laptop Computer, Intel Core i5-1235U 12th Gen, 15.6" FHD Anti-Glare Display, (16GB DDR4 RAM, 512GB SSD), HDMI, Bluetooth, Windows 11, Cloud Grey, W/GaLiMu
  • ✔【Display】 15.6" FHD (1920x1080) TN 220nits...
  • ✔【Memory & Storage】RAM Size 16GB 3200MHz...
  • ✔【Connectivity】 1x USB 2.0, 1x USB 3.2 Gen...
  • ✔【Processor & Graphics】 12th Generation...
  • ✔【Operating System】 Windows 11

This section contains HA/DR guidance for other key Azure Data components and services.

Next Steps

Now that you've learned about the scenario's architecture, you can learn about the scenario details

Original Post>