How to Build Data Products in 2023
Want to promote data to become a first-class citizen? Treat it like a product. Image by author
Companies worldwide are locked in a fierce race to transform into forward-thinking, data-driven organizations.
But a widespread legacy project-oriented mindset has left many of them feeling like they’re trying to sprint in steel boots.
Consider the bank I work at, for instance.
It’s hard to daydream about AI and advanced analytics when engineers are drowning in a sea of duplicated data pipelines — making it difficult to capitalize on the untapped value latent within our treasure trove of data.
And without the right data stack and incentives aligned, individual teams will continue to build costly-to-maintain single-use solutions that only add to the existing mountain of technical debt. Bah!
What’s the solution?
Close your eyes and think of the best products on Earth…
…the quality stuff you swear by every day.
As a keen traveler, some personal must-haves include my trusty Macbook Pro laptop, my workhorse Sony cameras, my formidable DJI Mavic aerial drones, and my ever-steadfast Asics sneakers.
Some of my favourite products. Can we “replicate” this with data? Image by author
What do all these awesome products have in common?
They have a brand. They exude quality and are trusted. They are produced with pride. They are created with the best ‘ingredients’ or parts. And they’re constantly updated to ensure they dominate their markets.
Disclaimer: I’m not sponsored by any of these brands…but wouldn’t that be nice!
So what’s the point I’m trying to make?
Well, it turns out that the revolution to replicate this product experience is underway within the enterprise data landscape.
Indeed, productizing data assets is being viewed by architects and technologists as a silver bullet to reduce data complexity in large companies.
By shifting from a project mindset to a product mindset, we begin to install the infrastructure and incentives-structures necessary to create quality and reusable data assets that can truly transform organisations into not just data-driven, but data-inspired powerhouses.
It’s a very exciting time.
In this article, I’ll guide you through how…
- Many large organizations are sinking in a quicksand of data problems;
- How data products are killing multiple birds at once;
- How these strategic data assets are built.
Let’s dive in.
1. Enterprise Data Challenges…uh oh!
Companies who really have their data shizz together are few are far between.
Big Tech — like Airbnb, Amazon, Apple, Google, Meta, Netflix, and Uber — are obvious standouts. That’s because their entire business model relies on having an impeccable data stack.
They are data businesses and represent exceptions to the rule.
Industry demands
But organizations in every industry are feeling the pressure to become data-driven because:
- Data-driven decision-making increases output and reduces waste.
- Consumers are increasingly expecting hyper-personalised digital products & services that leverage powerful analytics like AI and machine learning that are trained on all the quality data the company can muster.
- The scrutiny from regulators is only getting harder.
How the worlds of AI & machine learning intersect with enterprise analytics. Image by author
To give a personal example, I work in analytics at one of Australia’s ‘Big Four’ banks. And boy, it’s a tough situation.
We’re under pressure to lift productivity while reducing operating margins, which means we need decision-making to be based on evidence (i.e., data) and not on hunches like in the good ‘ol days.
We’re under assault by nimble neo banks peeling away our customers with incredible digital services, like the ability to grant them cheap mortgages within minutes — all via a smartphone app — a capability that requires a holistic “360-degree” understanding of the customer that leverages an incredible amount of data.
We’re under pressure from our regulators, who have given us a deadline to begin reporting more granular data. At the same time, the government has mandated the sharing of our customer data with other organizations in line with the increasing global adoption of open banking.
All of this means we need scalable data platforms populated by a complete menu of quality data with advanced analytics capabilities, such as AI and machine learning, ready to derive insights from our data.
Do we have that? Not yet.
Spaghetti bowl of duplicated data pipelines
The reality is the data landscape at many enterprises is bruised, battered, and littered with problems and technical debt.
It’s like a house full of accumulated mess that you need to clean up for an inspection soon. Not much stuff is properly organized and easy to find.
First up, is data complexity. Uhhhh!
Today, most companies approach enterprise data with a project mindset. Each time a business function has a problem it wants to solve with data, the company starts a new project to acquire the data, cleanse and prepare it, then analyze it for that specific use case.
Once the work is done, this bespoke single-use solution becomes set and forget, and the team moves on to something else.
What happens when a new problem arises?
You guessed it. As teams often work in silos, a whole new data project starts rather than evolving an existing one.
A mess of ad-hoc duplicated ETL pipelines litter enterprises. Image by author
Over time, this results in an overly complex data landscape characterized by a spaghetti of duplicated ETL pipelines with limited controls, governance, and oversight.
Most things are tactical, not much is strategic.
Data-driven organization? Hah!
So what does that mean for the company?
Bad things. It makes it quite difficult for employees to unlock the latent value in the data.
- Data consumers will get a hold of data whenever and however they can. Different teams use different methods to integrate and transform the data, creating a new data pipeline spaghetti strand for every new project. Aside from adding tech debt, these single-use investments impose a huge burden on the company’s subject matter expert (SME) resources, inflating costs.
- Data lineage is difficult to track and manage. There is a lack of clear ownership and accountability as the data flow throughout the organization in bespoke pipelines. This makes downstream data difficult to trust.
- Data producers don’t know how their data is used, creating data risks. Two teams could process the data in different ways that create conflicting analytical results that are then reported to decision-making and regulators.
All of this means that the company struggles to deliver projects in a timely manner and deliver the best insights the data can offer.
Not a great outcome for colleagues and customers alike.
2. Data Products Explained
How do we address these problems?
Architects realized that by wrapping product thinking around data, we align incentives and tech stack that promotes data to become a first-class citizen.
So what are data products exactly?
- Data products are high-quality, modular, reusable, and continuously-improved data assets that manage, and group large quantities of data together into logical uses. They can be used for specific projects, then rearranged for something completely different, scaling value across the company over time.
- Data products are wrapped in interfaces to enable discoverability (e.g. data marketplaces) and traceability (data lineage) to make them easy to understand, access, use, and trust.
- Data products can be consumed by applications (through APIs) to support daily business operations and by the analytics community to support business intelligence, data science, and machine learning.
At my company, we’re building a suite of foundational data products that span logical uses such as Customer, Mortgages, Products and Transactions.
These elementary data products — like basic Lego blocks or atomic particles — can be used for a specific purpose, then rearranged and reused for something completely different.
At my work, different combinations of our foundational data products are stacked together to create more valuable enterprise data products, like Customer 360, Interactions and Transaction Categorisation.
These ‘off-the-shelf’ data assets can be strategically relied on across the bank to provide a golden source of information on our customers, allowing us to engage them at the right place, with the right product, at the right time.
Incredibly powerful stuff.
Data as a product is a key pillar of data mesh, the next-generation data architecture that many organizations are currently migrating towards, including my own.
Four principles of data mesh: domain-ownership, data as a product, self-serve platform, federated governance. Source: Data Mesh Architecture (with permission)
In short, data mesh moves away from the idea of a single team managing a centralised data lake, towards a decentralised architecture where individual business areas across the company — who understand their data the best — are empowered to own and manage their own data pipelines and data products. This is known as domain-driven ownership.
These business teams are incentivised to craft quality data products that they can then publish on the data mesh and proudly share (or even sell) to other parts of the business.
Data products shared on the mesh. Source: Data Mesh Architecture (with permission)
Overall, data products…
- Make data consumers happy. They make high-quality data easy to find, access and use. They drive simplification, consistency and cost-savings across the enterprise data landscape by offering trustworthy golden sources of truth that can be used and reused by many teams and use cases. Say goodbye to that uncontrollable growth in spaghetti pipelines. Bust that data complexity once and for all! (In theory, at least.)
- Make data governance happy. Domain-driven ownership drives accountability. Data lineage is closely tracked and monitored.
- Make data producers happy. They understand how their data flows through the organisation and how it’s used, providing much-needed visibility over their assets and reducing the risk of accidental misuse that could cause damage to the company.
Want to know more about data mesh? Check out my Explainer 101 article on how the enterprise data landscape evolved towards data mesh.
3. How to Build Great Data Products
In short, organisations need to consider two sides of the same coin.
1. Build the right product.
Without meeting a customer need, a product fails — plain and simple.
2. Build the product right.
Like my Macbook Pro or DJI Mavic drone, data products are meant to be reliable precision-engineering that instills pride and trust.
Important considerations for building data products. Image by author
These two pillars then give rise to a whole host of questions, like:
- How do we know a data product is needed? Who’s using it?
- Who builds and owns data products?
- Where’s the data coming from?
Here’s one model. Data product owners will foster close relationships with various business domains (like Mortgages, Credit, Branches etc.) while working closely with each domain solution engineer whose teams are building the data products.
The data product owner serves as a bridge between the business and engineering.
Let’s dive into things in a bit more detail.
Building the Right Data Products
This requires a virtuous feedback loop between the data producers and data consumers in your company.
Just like with ‘real’ off-the-shelf products, you’ll conduct fantastic research to identify what data products are needed in the ‘market’.
Then link the accountability back to data producers who will be building these data products.
To make them really accountable for making sure the right products roll off the proverbial factory floor, these producers — sitting in their respective business domains — will naturally be the owners of their data products.
This is called domain-driven ownership — a key pillar of data mesh!
Data products solidify a virtuous relationship between users of data and their creators. Image by author
Once V1 of the data product is built and deployed across the mesh, consumers will use them and provide feedback (via ratings, requests and suggestions) to producers, who will then leverage this information to continually refine their offerings.
Like all products, it should be easy for data consumers to find out information about what they’re using.
Like all products, data products have transparent features. Image by author
For example:
- Product name: Customer Foundational Data Product.
- Brand: Built by the core customer team? Great! I trust that.
- Nutrition: How can this data nourish your use case? Is it suitable for use in my new customer segmentation ML model?
- Ingredients: What tables and source systems were used to build the data product?
- Instructions: How do I use access and use this data product?
- Packaging size: How big is this data product? Can I load this into Excel or PowerBI? Or should I leverage big data compute?
- Contact info: Who owns this data product? Who built it? How can I reach out to them?
- Barcode: What’s the financial model? How do I pay for it?
- Additional information: Is there a roadmap? How often is the data refreshed? Is there anything else I should know?
All up, treat your shiny new data product like you treat your favourite products.
Building the Data Products Right
On the flip side, how do you ensure your products are built to be high quality, reliable and trustworthy?
As an architect, your job is to ensure the data products are properly designed and offer strategic value. Which high-quality golden sources should you ingest your data from? Does my design and schema work well with my tech stack? Will my solution create any tech debt? Does it scale?
As a data engineer, you’ll craft the ETL pipelines designed by architects and solutions engineers. Where are my data products going to sit on the platform? What automated jobs do I need to set up? Were data quality issues addressed?
As a data producer, you’ll oversee all this work and attest to its data lineage, quality, reliability and trustworthiness because you own the data product. If something breaks, it’s on you and your team, as the data mesh federates ownership to individual business domains.
As a UX engineer, your job is the ensure the data products are accessible and easy to use by consumers across the mesh. How do users ‘shop’ for data products? At my work, we’re building a Netflix-style marketplace, which is pretty cool. Can my enterprise tools like Dataiku, PowerBI and Azure Synapse easily access the data?
4. Final Words
A talented entrepreneur once said the following:
“Great companies are built on great products.”
— Elon Musk, 2009
This type of product-oriented thinking for data is now rapidly sweeping across the enterprise data landscape.
Data products is being positioned as the vehicle (or is it bulldozer) to address a myriad of demands by customers, regulators and competitors all at once, while putting to halt the unchecked proliferation of duplicated data pipelines that has grown like a tumour over the years.
And the timing couldn’t be more critical.
We’re in a melting pot of technological innovation where things are moving at breakneck speed. Data is exploding, computing power is on the rise, AI is breaking through, and consumer expectations are ever-changing.
Data products are a major step forward for companies aiming to become that advanced-analytics-powered data-inspired organisation they’ve been chasing for since their initial forays into Apache Hadoop-powered data lakes.
Data products, once mature, even open up entirely new revenue streams for established companies, also known as the commercialisation of data. This will close the gap between many industries and the Big Tech digital natives like Google, Netflix and Google, who have built their entire business model around data and their ability to drive actionable insights from it.
But getting data products off the ground is no mean feat.
Companies need right skills, technology and strategy.
They need to commit to a large once-off capital expenditure in order to realise the dream of a far-greater ROI and lower cost-per-use down the line, compared to the current project-driven way of working.
The sheer number of organisations embracing data products demonstrates that they believe this transformation will be worth it.