Third-Party Data Is Now First-Class

Everybody should travel first-class some of the time.

Adrian Bridgwater

Data moves. When we create datum (singular) and eventually coalesce sets of information into the variously shaped datasets that we now send across the neural networked nodes of the web and the cloud, we push data down, across and through different transport mechanisms to use it in sources (applications, databases, web services, basically everything) that are distinctly different from where those first pieces of data were originated.

This is nothing new. We have been exchanging information across networks for considerably more than half a century.

But data sharing 1.0 was a comparatively conservative discipline i.e. we primarily connected to sources that fell inside the confines of the way software and hardware systems were architecturally designed from the outset and basically only really networked with ourselves. When we needed to download X from source Y, we connected in the prescribed way intended. It was very much a case of fitting plugs into known sockets and turning the power on.

Data sharing has left the building

Although those basic laws and principles of data connectivity remain, a more prolific data integration and sharing era has come about.

We now live in a time when data sharing has left the building. Enterprise organizations are now taking a much broader approach to data integration. Often making use of anonymized and obfuscated datasets that preserve the Personal Identifiable Information (PII) that would identify actual individuals, companies are exchanging datasets with a view to understanding the ‘shape’ of information values in order to be able to adapt to the known-knowns that other firms in the same market segment may have already experienced.

Suddenly now we get to a point where a company’s own information is of paramount importance, but we also reach a new level where third-party data is also massively valuable. This is the point at which third-party data has now become a first-class citizen in the information ecosystem that any business operates in.

If, for example, a bargain-price fashion brand starts trading in raincoats, then doesn’t it make sense if that business is able to view cost figures experienced by another firm in the same or similar trade? Shouldn’t oil & gas companies be able to see how employee work rosters are executed in other similar operations in other nations, even in competitors? The list of data-sharing examples (over and above these two very random ones) goes on and on.

How third-party data becomes first-class

There are now data exchange services that work rather like commodities exchanges to help organizations share data templates, architectures and shapes, but making third-party data first class is also happening at a core level in many enterprise technology platform vendors. So how does it all work?

“Organizations are realising the value of data collaboration as data becomes more readily available and plentiful,” said David Fisher, industry principal, media and entertainment EMEA at data cloud company Snowflake. “Rather than relying solely on their own data, entities are tapping into data that exists across the business ecosystem. This approach is opening doors for multiple parties to combine, collaborate and analyse data to deliver new business opportunities and enhanced customer experiences.

The birth of data clean rooms

To facilitate this growing trend, so-called ‘data clean rooms’ are becoming increasingly popular. Fisher describes these technologies as a means of allowing multiple organizations to join, come together and collaborate on data in a protected environment where participants are unable to see each other’s raw data. Sensitive data, such as PII, can be matched, but it is encrypted for each party. In a Snowflake data clean room, the data is not copied, moved, or shared, and analysis on large amounts of data can be carried out with high performance and scalability.

“Marketplaces are also driving the ability for organizations to combine first-party data with third-party data by enabling them to securely access and share data, data services and applications,” said Snowflake’s Fisher. “This insight can be brought directly into their own cloud data environment, reducing integration costs, delays and risks by eliminating the need for traditional ETL, API, or FTP processes. The impact of these technologies combined are changing the way organizations collaborate, analyse and derive insights from data, enabling them to unlock new opportunities for growth and success.”

Fisher’s reference to that group of three-latter-acronyms (TLAs) denotes ETL – extract, transform, load; API – application programming interface; and FTP – file transfer protocol. Because these are all ‘transport’ mechanisms or data channel-enabling technologies of one form or another, they take time to engineer and this is part of what more third direct third-party data exchange seeks to circumvent.

Notwithstanding the inevitable competition blossoming in this market and the natural development of differentiated services, there’s a broad consensus forming in this space. “Data is meaningful and valued when we use it to drive change,” said Keri Gilder, CEO, Colt Technology Services, underlining a core truism that we should perhaps take in more practical terms going forwards. Looking for a practical application of this statement, Gilder points to real world use cases.

Data for positive change

“For a business in any industry vertical, mapping real progress against sustainability goals is impossible without information on energy use and carbon emissions, but even these metrics give a view through a narrow lens. Once you interconnect and integrate these measures with third parties’ environmental data – particularly around partners’ scope 1 and 2 emissions such as electricity consumption, fuel and refrigerants and upstream scope 3 such as purchased goods and services and capital goods – you have a clearer picture into your business’ true environmental impact. This intelligence then allows you to drive material improvements across the ecosystem.” said Colt’s Gilder.

As defined and explained by Deloitte here, scope 1, 2 and 3 emissions are, “A way of categorizing the different kinds of carbon emissions a company creates in its own operations and in its wider value chain.”

Colt’s travel platform is an example of data sharing to generate action: it integrates with a third party so that when employees enter their requirements, they’re shown emissions data to help them make smart choices around sustainable travel. Suddenly, enthuses CEO Gilder, third-party data becomes an actionable, powerful force for good.

“The use of data from external sources is as old as analytics, and the competitive advantage companies gain from using third party data for insights is massive. With synthetic data, third party information can be exchanged in anonymized ways while also stripping out bias and ensuring datasets are properly shepherded,” said Alan Jacobson, chief data and analytics officer at Alteryx.

“Data is an inherently human product – it risks the same bias as its creator, curator, or collector. The ability to remove PII and other sensitive data to ensure safe and ethical use of the data is key to appropriate business use – not just to keep the overall ‘shape’ of a dataset, but also to meet requirements around sovereignty, governance, and other legal, ethical, or bias-centric roadblocks,” added Jacobson.

The Alteryx data leader also says that synthetic ‘no-party’ data offers the ability to create a fabricated simulacrum of what a dataset should look like. Despite being PII free, models fed with quality synthetic data can often deliver high precision against the initial ‘real’ dataset. Synthetic data techniques enable organizations to open up more data to the enterprise to see around corners and deliver good results.

With all this first, second and third-party data now whirling around us between platforms, we need to manage it and secure it, all within the realm of hyperconverged cloud services. Working in precisely this space, Cohesity has plenty to say on the subject of third-party data usage.

“Sharing data quickly and easily, without compromising cyber and compliance controls, is something that all businesses need to do. But exploiting the full potential of data sharing comes with risks, regardless of where it’s located and what it is,” said Mark Molyneux, EMEA CTO, Cohesity.

This subject now leans towards what Molyneux and team call Digital Operational Resiliency (DPR) and why it is core to third-party data becoming first-class. This is the point which we concentrate of making sure data is secure, protected and governed, with compliance and security requirements automatically enforced in the background.

“In order for this automation to work consistently, companies must break down their fragmented data silos, bringing everything together on an open data security and management platform, which in turn encrypts every file, examines the content, classifies it, and regulates access to it,” said Cohesity’s Molyneux. “After this, automation can enforce rules, such as ensuring that personal data is only made available in an anonymized form. Once the business case is completed, the data can be automatically deleted, or it can be retained for a defined regulatory period.”

Cleaning the data room

If there is a hot trend among data scientists right now (outside of generative AI and all forms of Machine Learning and automated autonomous controls – okay… and perhaps graph theory too) then it might just be the data clean rooms that make up this sector of the total IT fabric.

“Customers have asked us to build AWS Clean Rooms so they can more easily and securely combine and analyze data stored across various sources and organizations,” noted Akram Chetibi, GM of AWS Clean Rooms at AWS Applications. “Building data clean rooms has historically been complex and takes companies months to create sophisticated privacy controls and specialized data sharing systems that hinders getting business insights quickly. AWS Clean Rooms helps customers launch a clean room in just a few minutes, and removes unnecessary heavy lifting so they can focus on analytics and data insights to drive their business forward.”

The new reality of data sharing might seem somewhat at odds with the way we have always talked about data privacy and security up to now, but this is – as we have hopefully shown – sharing of data shapes, trends, flows, weights and movements with an additional view into outlier and anomaly spikes… rather than actual data ‘values’ in specific terms.

Original Post>

Enjoyed this article? Sign up for our newsletter to receive regular insights and stay connected.