Unlocking Enterprise AI with Data Science

Add bookmark

So you’ve created a new, innovative data science model. Now what? It’s time to operationalize. For those who might be unfamiliar, operationalization (016n) refers to the deployment of a machine learning model across the enterprise. It also means it’s time for you to validate the robustness of the model and its readiness to go live. 

With that in mind, we invited Alexander Hubert, Dataiku’s Lead Data Scientist, to shed light on how to effectively deploy and operationalize data science projects to maximize business impact. After all, as Alexander explains, “data science is only valuable if it’s delivering value to the business.”

Bridging the gap from design to 016n

Successfully deploying enterprise-wide data science models can be very challenging as doing so requires careful coordination across multiple teams. In fact, according to a recent Dataiku user study, 80% of data science projects are never truly operationalized. Furthermore, 60% of individuals surveyed are ensure of how many models they have in production.

With this in mind, Dataiku developed a five-step process for operationalizing data science: achieve IT environment consistency, continuous retraining of models, functional monitoring of value, robust data workflow, and performance and scalability testing.

All too often issues arise due to the fact that the IT environment in which the model was created differs greatly from the one in which it is deployed. By ensuring these two IT environments are aligned, not only do you avoid potential performance issues or data inconsistencies issues down the line, it can also help you reduce costs by as much as half. Alexander also recommends using a package management tool for each project and, when it comes to effectively managing the code base and enabling a robust roll back strategy, Alexander suggests employing a versioning tool.

Predictive models need to be constantly updated to maintain their competitive advantage. As a result, models must be continuously re-trained which, if done manually, can be incredibly time consuming and arduous. However, by automating the updating process, data science leaders can re-focus their energy on more important matters. Creating an alerting systems to prompt human quality checks and a/b testing are also critical for ensuring the automated retraining processes go as smoothly as possible.

Demonstrating and communicating the end-results of your predictive model and its impact on the system is one of the most critical steps in this entire process. Functional monitoring combined with a robust, multi-channel communication strategy is one powerful way of doing this. Slack, web-based dashboards and daily email alerts are just some of the way you can share insights and data science tools across the enterprise.

Implementing a robust data workflow solutions will help ensure that the right data is collected as accurately and efficiently as possible. Though there isn’t a one size fits all to optimizing data workflow, it is a critical component of ensuring the credibility of your model. He also briefly discusses some of the challenges associated with data workflow such as developing failover strategies and systems integration.

Last but not least, testing for performance and scalability is critical to ensuring your platform is agile enough to accommodate rapid change without disruption. For example, what happens when the volume of customer request spikes? How does the system respond to especially complex requests? When testing for scalability, Alexander recommends focusing on active monitoring of data no longer used in the pipeline, active monitoring of job execution time and using large scale processing tools to optimize pipelines. 

Building a ready AI project

The fundamental challenge of deploying AI is not software or tech but the organization itself. Siloed data, siloed people, and siloed processes can prevent an organization’s ability to adapt in a world where customer demands are rapidly changing.

Alexander goes on to outline 2 approaches to data science: inclusive and exclusive. In the exclusive approach, work is siloed to specific and specialized teams. Whereas with inclusive approach all SMEs are involved. Though the exclusive approach is generally more expedient and precise, the inclusive approach can deliver more strategically aligned models, helps protect against bias and supports an iterative approach to development.

Last but not least, Alexander emphasizes to need to provide users with a single, self-service analytics framework that empowers your people to be creative with data.  As he puts it, “start small, iterate and optimize later.”


RECOMMENDED