While the results of data science are often productized, and put into other analytic systems and incorporated into mission-critical enterprise systems and business processes, the core value of data science is about innovation . The goal is to examine data and come up with new insights that can help run your business better.
The work of data scientists is also highly dependent on others — most especially IT. Yet, many of the ways that IT operates don’t naturally support the work of data scientists. IT craves predictability and is focused on running mission-critical systems in a stable and reliable way while enabling users to solve their own problems to the greatest extent possible. But the type of IT environment that’s appropriate for a business analyst is different from the space a data scientist needs.
This article examines how IT can create the perfect laboratory for data scientists – one that supports experimentation and creativity – so they don’t end up doing work they shouldn’t be doing. In such an environment, data scientists can go as fast and as far as they want to go without barriers, while IT provides on- and off-ramps for the data science process.
Why the IT-data science dynamic is tricky
Think of a business as a factory, where there’s a relationship between the people who create the products and those who are running the factory. In many ways, IT is like those in charge of running a stable factory. Data scientists, by contrast, are both coming up with new ways to make the factory run better and products for the factory to make and then push out to the market. IT should provide data scientists with the raw materials and capabilities to do their jobs, as well as testing their prototypes. Eventually, once the products are proven and reliable, data scientists want their work to become part of the factory, hand off their oversight to IT, and not have to babysit these projects anymore. This allows data scientists to continue to do what they do best: experiment and innovate.
IT generally follows standardized processes, methodologies, systems, and tools. IT also relies on automation, as IT can’t function using manual processes in unique environments. By contrast, for data science, each problem and each scientist is singular, and every problem has a unique set of data and tools required to solve that problem.
Effective data science often operates comfortably in the unknown. The problems data scientists are trying to solve are generally open-ended and require adaptability — they will need access to all the data that’s available to be able to experiment, with no fixed time horizon, to find the best solution. This goes against the fixed time frames and predictability IT wants to operate within. IT generally operates in a predictable manner that manages changes in an orderly way, whereas data science is agile and spontaneous, seeking to go where the data leads. In data science there are always new tools, techniques, algorithms, and research that are being incorporated into the work. For IT to properly support data science departments, it must keep up with that.
In essence, it is IT’s job to make sure the factory floor runs efficiently, whereas data scientists’ jobs are to push the boundaries. Self-evidently, there’s an inherent tension between these two roles. IT and data scientists come to loggerheads when data scientists want to take risks IT doesn’t want to accept or when IT hasn’t established a solid enough foundation for data scientists to do their work independently. The ideal relationship is when data science has a baseline of capabilities from IT, but IT also creates limits to prevent unnecessary risks. Getty Images
On-ramps and off-ramps: The optimal IT-data science relationship
So what does the optimal IT-data science relationship look like? For starters, IT creates on-ramps that allow data scientists to do their work. This involves preparing supportive environments for data scientists. IT creates data products incorporating all data that are both usable and accurate, and unifies all sources of data into one or more product, people, or customer objects. Ideally, data scientists are free to create new purpose-built data sets to drive innovation. IT must provide an environment where the data science team can operate at all levels of an organization’s data stack, and bring in new data when necessary.
Once data scientists get something right with their innovations, IT must provide an off-ramp from the lab so the models and analytics and the data supply chains that feed them can be passed off to the IT team to run. In advanced enterprises, IT can provide the data science team with tools to do work that is easily transferred into production. IT and data science teams must work together to establish plans for converting something into a production environment. Data scientists must be aware that they’re not working in an experimentation vacuum and have to keep in mind the practical implications of how their creations can be mainstreamed and brought to market. To make this a reality, IT should help data scientists avoid common pain points in their work, including unnecessarily onerous data prep, making it simple to find and prepare data (often through data catalogs), and finding ways to test and support data ops.
IT can also support data science is by working collaboratively in an R&D-like fashion where the production process never stops. In such a setup, when data scientists come up with new tools, IT can start the validation of the tool even before the product is ready for production. It’s not enough for IT to understand how to support the data scientist’s innovation; IT also has to have a data science production factory that can accept new algorithms for productizing and bringing to operational maturity, with all the required resiliency, compliance, and other factors. This speeds up the iterative process by allowing data scientists to focus on creating new algorithms instead of building the infrastructure to put those algorithms to use. Additionally, IT can aid data scientists by ensuring they have the computing power necessary to build models. If data scientists can’t build effective models because they don’t have the GPUs or data available they need, data science will fail to function properly.
By providing the on- and off-ramps which empower data scientists to do their work, a symbiotic and harmonious relationship can develop in which IT and data scientists create a thriving production cycle for the business.
To further explore data science best practices and how to adopt a framework that maximizes business productivity and accelerates time to value, check out the new IDC white paper: Industrializing Data Science with Data Analytics Factory Framework (DAF).
Interested in discussing ways to improve collaboration between your IT teams and data scientists? Contact Matt Maccaux at: [email protected].
About Matt Maccaux
As Global Field CTO for HPE Ezmeral software, Matt brings deep subject-matter expertise in big data analytics and data science, machine learning, application development & modernization, and IoT as well as cloud, virtualization, and containerization technologies.
Copyright © 2021 IDG Communications, Inc.