Managed Virtual Networks and DNS for Synapse Analytics to enable a seamless data estate

Overview

While working with many customers who have implemented Azure Synapse Analytics and data management landing zone, we have seen customers face a few challenges while enabling seamless hybrid connectivity across Azure Synapse services and data integration. This blog talks about a few nuances of Managed Virtual network and how the data flows end to end between Azure Synapse and on-premises.

Managed Virtual Network is a new capability that eases network configuration while creating services such as Azure Data Factory, Azure Synapse Analytics and Azure Purview. It builds on Private Links and Private Endpoints. Private Endpoints enables many Azure Services listed here to have a network interface in a customer Virtual Network and each Azure resource gets a private IP address from that Virtual Network address space. This provides secure access to specific Azure Resources over ExpressRoute/S2S VPN and within Azure.

With Managed VNet for Azure Synapse, there are a few additional aspects:

  • Azure Synapse Managed private endpoints.

This manages the private endpoints from Synapse workspace to various other Azure Resources. For e.g to ADLS Gen2, Cosmos DB, Azure SQL Databases database. Traffic between Azure Synapse workspace and the Azure Resources traverses the Microsoft backbone network over private link.

  • Managed Virtual Networks are not visible to users. They are managed by Azure Synapse workspace.
  • Azure Synapse Private Link Hubs

Synapse provides a studio which is a web UI that can be accessed to create various artifacts such as notebooks, pipelines and sql scripts. Private Endpoints connections are added to Private Links hubs so that the Synapse Studio can be accessed over a private IP address.

  • Private DNS

Azure Private DNS can manage the DNS name resolution for Azure Resources which have private endpoint connections. This is a key aspect for enabling hybrid connections from on-prem. More on this later.

Prerequisites

Please review the below articles for more information.

Managed Virtual Network in action

An example Managed Virtual Network is depicted below to illustrate a few key components and flows.

managedvnetdns1

There are three main flows:

  1. Connectivity from On-prem to Azure Synapse Workspace.

Data Analysts and Data Engineering teams who are creating pipelines and notebooks connect to Synapse Studio, Serverless and Dedicated pools and securely access data stored on ADLS Gen2 and other Azure Data sources.

  1. Connectivity from Azure Synapse workspace to Azure Data lake Gen2.

Azure Synapse SQL Engines and Spark pools connect to ADLS Gen2 for data exploration and data processing.

  1. Connectivity from Azure Synapse Workspace to On-prem data sources

Azure Synapse Pipelines and Mapping Dataflows need to connect to On-prem data sources such as SAP, Files Servers, Oracle and SQL Server DBs etc. for ETL.

The above scenarios can get complicated for a very large team who might be in different geolocations, and accessing many environments (Dev,QA,Prod etc) and connecting to quite a few data lakes (may be 10s or even 100s). For the above scenarios to work seamlessly, there are a few infrastructure components that are required like on-prem DNS, a DNS server in Azure and Private DNS.

Landing zone 

Before the above flows can be achieved, the following infrastructure components must be in place.

  1. ExpressRoute or Site to Site VPN connection established between on-premises and Azure Virtual network gateway.
  2. Private Link and private endpoint resources created. Synapse Workspace owner needs to initiate a private link to target resources for e.g Storage Accounts. The owner of the storage account needs to approve the connection. If approved the private endpoint is created.
  3. DNS Servers configured, and A record entries created. Three DNS servers are required as shown in the diagram above. A records can be created if there a only few Azure resources.
  • On-prem DNS

This uses conditional forwarding for the Azure resource domains to custom DNS in Azure.

DNS conditional forwarding is required if 10s or 100s of Azure resource names need to be resolved. Instead of configuring an A record for each resource, conditional forwarding can simplify this. 10.15.0.4 is the private IP address of custom DNS in Azure VNet.

on-prem-dns

  • Custom DNS or Proxy DNS

This is a custom DNS server(hosted in a VM) provisioned in Azure VNet that forwards DNS lookup queries to Azure DNS(168.63.129.16). This is needed because Azure DNS can resolve DNS lookup queries originating from the VNets that are linked to the private zones.

custom-dns

  • Private DNS

A private DNS is created for each Azure resource type. This is needed for Azure DNS to resolve Azure domain names to private IP addresses.

private%20dns

SaleBestseller No. 1
Acer Aspire 3 A315-24P-R7VH Slim Laptop | 15.6" Full HD IPS Display | AMD Ryzen 3 7320U Quad-Core Processor | AMD Radeon Graphics | 8GB LPDDR5 | 128GB NVMe SSD | Wi-Fi 6 | Windows 11 Home in S Mode
  • Purposeful Design: Travel with ease and look great...
  • Ready-to-Go Performance: The Aspire 3 is...
  • Visibly Stunning: Experience sharp details and...
  • Internal Specifications: 8GB LPDDR5 Onboard...
  • The HD front-facing camera uses Acer’s TNR...
Bestseller No. 2
HP Newest 14" Ultral Light Laptop for Students and Business, Intel Quad-Core N4120, 8GB RAM, 192GB Storage(64GB eMMC+128GB Micro SD), 1 Year Office 365, Webcam, HDMI, WiFi, USB-A&C, Win 11 S
  • 【14" HD Display】14.0-inch diagonal, HD (1366 x...
  • 【Processor & Graphics】Intel Celeron N4120, 4...
  • 【RAM & Storage】8GB high-bandwidth DDR4 Memory...
  • 【Ports】1 x USB 3.1 Type-C ports, 2 x USB 3.1...
  • 【Windows 11 Home in S mode】You may switch to...

Hybrid Connectivity end to end

Connectivity from On-prem to Azure Synapse Workspace

With the above infrastructure components and configuration in place, on-prem workstations and developer machines can resolve Azure Synapse Studio, SQL domain names and data lake endpoints to private IP addresses.

managedvnetdns-connection1

As you can see above when a on-prem machine connects to Synapse studio and user navigates to the data lake, both Studio and ADLS Gen2 domain names are resolved to private IP addresses.

Connectivity from Azure Synapse workspace to Azure Data lake Gen2

With Managed Private endpoints in Synapse Workspace, traffic between Synapse compute engines(SQL Serverless, Dedicated and Spark pools) and data lake traverses the Microsoft back bone to connect to private endpoints. In the example below, a notebook running on spark pool can connect to a data lake with a private endpoint.

managedvnetdns-connection2

Connectivity from Azure Synapse Workspace to On-prem data sources

With Managed Virtual Network, the pipeline integration runtime is provisioned in the Managed Virtual network and is managed(patching, NSGs, firewall) by Azure platform. In the connection tests below, the Synapse Pipeline can connect to on-prem file server using a self-hosted integration runtime(SHIR) that is hosted on—premises and to data lake private endpoint using Auto Resolve Integration Runtime which is hosted in the Managed Virtual Network.

Connection to an On-prem file server with self hosted integration runtime

New
Naclud Laptops, 15 Inch Laptop, Laptop Computer with 128GB ROM 4GB RAM, Intel N4000 Processor(Up to 2.6GHz), 2.4G/5G WiFi, BT5.0, Type C, USB3.2, Mini-HDMI, 53200mWh Long Battery Life
  • EFFICIENT PERFORMANCE: Equipped with 4GB...
  • Powerful configuration: Equipped with the Intel...
  • LIGHTWEIGHT AND ADVANCED - The slim case weighs...
  • Multifunctional interface: fast connection with...
  • Worry-free customer service: from date of...
New
HP - Victus 15.6" Full HD 144Hz Gaming Laptop - Intel Core i5-13420H - 8GB Memory - NVIDIA GeForce RTX 3050-512GB SSD - Performance Blue (Renewed)
  • Powered by an Intel Core i5 13th Gen 13420H 1.5GHz...
  • Equipped with an NVIDIA GeForce RTX 3050 6GB GDDR6...
  • Includes 8GB of DDR4-3200 RAM for smooth...
  • Features a spacious 512GB Solid State Drive for...
  • Boasts a vibrant 15.6" FHD IPS Micro-Edge...
New
HP EliteBook 850 G8 15.6" FHD Laptop Computer – Intel Core i5-11th Gen. up to 4.40GHz – 16GB DDR4 RAM – 512GB NVMe SSD – USB C – Thunderbolt – Webcam – Windows 11 Pro – 3 Yr Warranty – Notebook PC
  • Processor - Powered by 11 Gen i5-1145G7 Processor...
  • Memory and Storage - Equipped with 16GB of...
  • FHD Display - 15.6 inch (1920 x 1080) FHD display,...
  • FEATURES - Intel Iris Xe Graphics – Audio by...
  • Convenience & Warranty: 2 x Thunderbolt 4 with...

Connection to an On-prem file server with self hosted integration runtimeConnection to Data Lake with Auto resolve integration runtime

Connection to Data Lake with Auto resolve integration runtime

Conclusion

Thus with Managed Virtual Networks and Private Endpoints, hybrid data management and data security is further simplified and enables a seamless data estate spanning on-premises and Azure.

https://techcommunity.microsoft.com/t5/azure-architecture-blog/managed-virtual-networks-and-dns-for-synapse-analytics-to-enable/ba-p/3268509