Applied AI: Enterprise Data ScienceAdd bookmark
In the 70 years since Alan Turing released Computing Machinery and Intelligence, Artificial Intelligence (AI) has gone from a theoretical concept to an integral part of our everyday lives.
From recommending what to watch next on Netflix to revolutionizing the way we diagnose life-threatening illnesses such as cancer, there are few areas of our lives that AI has improved or, at the very least, impacted.
Businesses of all sizes and types are also leveraging AI to not only develop new products and services, but optimize back office processes and boost productivity everywhere from newsrooms to the manufacturing floor.
However, despite the vast potential business value AI brings to the table, one thing many of these glowing reports often overlook is the rather dismal failure rate of AI projects. According to a recent IDC survey, a quarter of companies surveyed reported up to a 50% failure rate. Other studies have found that only 12% of AI projects make it past the piloting phase and into production. Though building AI and ML solutions is easier than ever before, operationalizing and scaling them across the enterprise clearly remains difficult if not impossible for some. It’s not just about having the right tools in place- though that is important, applied AI is about ensuring underlying systems work together in seamless harmony.
Most AI/ML projects fail because of three primary causes:
- Lack of reproducibility - Initial AI/ML solution may perform great. However, when others try to recreate the solution, it doesn’t deliver the same results
- Lack of leveragability - While Data Scientists build the models, operationalizing ML models are either done by data scientists or MLOps team. Robust AI requires integrated full data stack visibility and monitoring between these two teams.
- Lack of scalability - In order for AI to truly deliver ROI and really have any meaningful strategic impact, it must achieve scale. However, many organizations lack the IT infrastructure or unified vision to accomplish this goal.
This is where Applied AI comes in.
What is Applied AI?
The term “applied AI” encompasses all of the tools and artifacts underlying the operationalization of AI from experimentation to production. The goal is to build a highly composable network of tools, systems and other IT artifacts capable of running multiple AI/ML projects at once.
In other words, applied AI is not just about the development or deployment of AI/ML, but ensuring it runs on your data and delivers real-world results.
As you can see in the diagram below, only a small fraction of real-world AI/ML is code (highlighted in dark grey). The rest is a vast and complex system of tools, processes and infrastructure:
Image sourced from "Hidden Technical Debt in Machine Learning Systems," https://papers.nips.cc/paper/2015/file/86df7dcfd896fcaf2674f757a2463eba-Paper.pdf
AI works by combining large amounts of data with fast, iterative processing and intelligent algorithms, allowing the software to learn automatically from patterns or features in the data. Having a robust enterprise data science framework in place is critical for ensuring data flows in and out of the AI are fully integrated, highly accurate and rapidly flowing.
The goal of “Enterprise data science” is to extract all possible knowledge or actionable information from the digital assets of an enterprise and use it as a driver for change and value creation throughout the organization. An Enterprise Data Strategy (EDS) helps organizations do this by aligning business strategy, technology roadmap, security plan, and other high priority facets into one holistic vision or plan of action.
Effective Enterprise Data Strategies:
- Outline all the capabilities necessary to achieve the desired business outcome (data quality management and tools, organizational structure, data acquisition, network strategy, compliance, ethics, etc.)
- Lay out a multi-year roadmap for achieving the capabilities
- Sets clear expectations on what’s possible (timeframe, costs, etc.)
What are the key considerations for Applied AI?
As mentioned before, applied AI refers to all of the underlying people, processes and systems that support, essentially, the democratization of AI. Though every organization’s approach will look different, some key components that enable applied AI are:
- Centralized data repository - As organizations prepare their infrastructure for enterprise AI, storage- specifically the ability to scale storage as the volume of data grows- must be a top priority. Many organizations are leveraging Hadoop to build enterprise data and analytics environments capable of supporting the massive amounts of data needed to run AI/ML.
- Catalogue of data points or features - Easy one stop shop to understand the actual meaning behind available data. Especially important for sharing information across multiple teams.
- Data Quality from training to production - Strict procedures, documentation, and testing of any production data point. Data must be reproducible from your enterprise data warehouse. Data verification and cleansing - the process of updating or removing data from a database that is inaccurate, incomplete, improperly formatted or duplicated - are also critical components of applied AI.
- AI/ML data pipelines - The architectural system for collecting, transporting, processing, transforming, storing, retrieving, and presenting data must be in place and fully optimized.
*Image sourced from "Data Pipelines and the AI Engine," https://www.extremenetworks.com/extreme-networks-blog/data-pipelines-and-the-ai-engine/
- Deep learning algorithms are highly dependent on communications and require “high-bandwidth, low-latency and creative network architectures” to function. However, traditional networks models simply aren’t designed to handle massive amounts of data. Organizations looking to launch enterprise analytics need to work with their providers to build upgraded networking infrastructure capable of delivering exceptional performance, flexible bandwidth, high availability and new levels of control. In some use cases such as computer vision networking infrastructure also needs to facilitate real-time data transmission.
AI Data Processing & Training
- Deep learning needs computationally-intensive training and lots of computational power help to enable speeding up the training cycles. In addition, as the number of AI models in production grows, so do issues with training. To provide the computing power necessary to support these needs, many companies rely on graphics processing units (GPU). Designed for parallel processing, GPUs can significantly accelerate the deep learning training process especially when it comes to tasks that require compute-intensive matrix multiplication such as image classification, video analysis, and natural language processing (NLP).
- Model ingestion - Capability to upload, store and score models in multiple formats. Provides a robust set of options necessary to future proof and easily update a model without errors.
- Feature creation - The link between raw data and model inputs- feature engineering is the process of creating features (also called "attributes") that don't already exist in the dataset. Supports the ability to easily upload multiple languages and minimizes dev work.
- Champion/Challengers/Model Selection - The idea is to compare two or more models against each other in order to promote the one model that performs the best. If the dominant one deteriorates later on, the other can easily be swapped in.
- Access control - By limiting who can update models, maintain appropriate activity and prediction logs, and adequately test models, businesses can minimize risk, ensure legal and regulatory compliance, and create a repeatable process to scale AI adoption.
- Changelog / version control - Successful AI projects require a significant amount of trial & error. Version Control Systems help a product team track & manage changes to source code over time.
- Model Lifecycle Management - The cyclical process that data science projects follow. It defines each step that an organization should follow to take advantage of machine learning and artificial intelligence (AI) to derive practical business value. Should serve as a location store for all past and current models with complete metadata on creator, settings, inputs, etc.
- Production system & Model Monitoring - As ML and predictive model performance tends to gradually degrade or “drift” over time, system performance needs to be tracked continuously and in real time. Inadequate monitoring can lead to incorrect models left unchecked in production, stale models that stop adding business value, or subtle bugs in models that appear over time and never get caught.
- Population Stability Index (PSI)- just because data is available doesn’t mean it’s correct. PSI goes on beyond “data availability.” Measures how much a variable has shifted in distribution between two samples over time.
- Real time alerting - Automated anomaly detection helps organizations identify and manage model performance issues. However, these systems need to be capable of tracking thousands of features and models.
Real-World Examples of Applied AI
Though applied AI is still, clearly very much in its infancy- a number of companies are leading the way in terms of adoption:
Target’s enterprise centralized data and analytics system pulls data across the enterprise from website clicks to underlying systems such as APIs and transforms it into predictive insights or forecasts.
Like many big-box retailers, Target is using applied AI to better understand their customers shopping preferences and deploy targeted, personalized marketing campaigns to individuals based on these insights. More notably, they can also use these AI tools to predict major life events that could dramatically change shopping habits such as pregnancy or home ownership.
With its Deep Brew project, Starbucks is ”ideating and working on a broad suite of AI tools to help elevate every aspect of the business and the in-store and customer experience.” By integrating and leveraging
massive amounts of data from all facets of the business, Starbucks is starting to implement a wide variety of AI initiatives ranging from personalized product recommendations to predictive inventory management. Starbucks is even outfitting espresso machines with sensors that centrally log and analyze every shot delivered and use predictive analytics to assess potential areas for tuning and preventative maintenance of the machines.
However, the real end goal is for AI and data analytics to forge a deeper connection with the customer base. “We plan to leverage Deep Brew in ways that free up our partners [in-store workers], so that they can spend more time connecting with customers,” Starbucks CEO Kevin Johnson wrote in a 2019 blog post.
Luckily for us, Starbucks has been very transparent about their AI/ML approach and you can view a full overview of its Deep Brew System in the video below:
With over 1B active users, Facebook has one of the largest data warehouses in the world, storing more than 300 petabytes. As a truly AI-centric organization, for years Facebook has been leveraging AI to do all sorts of interesting things from detecting hate speech to helping advertisers better target and deliver personalized marketing messaging to potential customers.
Though Facebook’s aggressive pursuit and questionable application of user data is controversial, to say the least- its overall enterprise data approach has enabled them to become pioneers in the AI/ML realm. According to an October 1, 2020 blog post, Facebook’s Applied AI research team is currently focused on:
- Computer Vision: Developing models to better understand visual content
- Language & translation technologies (LATTE): Connecting people no matter their language preference
- Personalization: Providing value by connecting people to what’s meaningful for them
- AI Infra: Enabling and scaling artificial intelligence and its use at Facebook
- Facebook Reality Labs (FRL): Building VR and immersive experiences
Like Starbucks, Facebook is fairly open about what their AI/ML infrastructure looks like.
Image sourced from "Overview of Facebook’s AI ecosystem," https://firstname.lastname@example.org/how-facebook-scales-artificial-intelligence-machine-learning-693706ae296f
Have 2 minutes? Tell us about your experience with applied AI.
Can't view the survey above? Access it here: https://www.surveymonkey.com/r/S6BHJD7