Data is the raw material of the digital economy. This is apparent from the development of data products and services that support many business processes intended to deliver value to internal or external data consumers. However, as data enables many critical use cases, such as business intelligence and machine learning, it also can expose organizations to security and compliance risks, such as privacy infringement, exposure of sensitive data and unauthorized access to data. To mitigate the risk of information exposure, security and data organizations should collaborate to design controls that enforce access policies identified by data owners and/or stakeholders. Data access authorization systems are used to implement data access controls. Assuming your organization has established an appropriate data governance framework and operating model, it is important to understand where access control fits conceptually into data and analytics (D&A) pipelines before designing and implementing these controls as shown below:
The D&A pipeline typically has three segments:
- Upstream segment: This includes data sources such as (internal) operational systems, external systems and streaming data. It also includes integration capabilities to feed data into data platforms.
- Midstream segment: This includes data infrastructure in general, which includes data platforms that transform and persist data, such as data warehouses, data lakes and data meshes. It also includes capabilities that support data ingestion and processing capabilities to prepare for data analysis.
- Downstream segment: This includes the final data preparation capabilities for aggregating data. Subsequently, it provides services to deliver data via the data access layer for consumption in different use cases. The access layer capabilities can be externalized or embedded in data platforms.
The data access layer provides a set of capabilities to implement a policy-based data access authorization system (among other features, such as information hiding and replacement of technical names with user-friendly names). This enables authenticated users to access authorized databases based on their entitlement, in the right context, and for the right purpose. The data access layer spans midstream and downstream segments to centralize data access policy management while enabling decentralized data access policy enforcement for a variety of data use cases.
However, the data access layer requires a complex architecture and many different components to cater for all different data use cases and underlying identity and data technology environments. In addition, not all commercial products always support access interaction with a data access layer. Many organizations recognize that the traditional approach to managing data access authorization (typically using RBAC) is siloed, inefficient and hard to scale to meet their requirements across different use cases and operating units. Organizations increasingly implement a modern data access layer using authorization management products with specialized and integrated features for data governance and metadata management. These include tools for managing data access authorization in the cloud or on-premises using different access patterns as shown below:
There are three types of modern data access authorization pattern:
- Data platform access pattern: The data platform access pattern uses the native access control capabilities of data platforms, which are augmented by a purpose-built authorization tool. This pattern is suitable for use cases where data is accessed directly by data scientists or as feeds for data science systems. Examples of such use cases are business intelligence, data science and machine learning.
- Data gateway access pattern: This pattern usually features a PEP agent integrated with an in-line data gateway (such as a data virtualization capability or an SQL gateway). This removes any dependency on either the originating application or the data platform where you need to virtualize data. This pattern is suitable for use cases where data is accessed by a system through APIs and microservices to prevent direct access to data repositories.
- Data endpoint access pattern: The data endpoint access pattern usually involves an endpoint within the authorization tool itself, which is enabled by a query engine. The query engine provides access to the registered data sources within the authorization tool as a proxy using data virtualization and the data consumer’s context. This pattern is usually useful for direct integration into code for interactive analysis and for business intelligence and visualization tools.
Security and risk management teams should collaborate with data and analytics teams to:
- Define D&A access control requirements at design time as part of an overall data governance model. This should include an operating model that enables alignment of, and collaboration between, risk and compliance, information security, and data management organizations. The requirements should provide guidance for access policy management and approval.
- Design D&A access controls using a flexible PBAC model that provides centralized policy management while catering to a variety of use cases, organizational requirements, and technical environments. Such a model should leverage appropriate authorization architecture patterns and a combination of available mechanisms, such as RBAC, ABAC and policy orchestration, to support decentralized policy enforcement.
- Augment D&A access control’s design to address dependencies, including data discovery capabilities, data classification and tagging services, integration with metadata management systems, provision of attribute information at runtime (as needed), and deep integration with D&A platforms to enable policy orchestration.
- Evaluate and deploy a modern authorization tool to implement D&A access controls that address key policy management and enforcement requirements. Use key best practices and patterns mentioned in this report, such as the data platform, data gateway and/or data endpoint access patterns.
Authorization Control for Data and Analytics Pipelines