Data Science and MLOps use case
To operationalize data analysis and model creation, your enterprise needs integrated systems and processes. Cloud Pak for Data as a Service provides the processes and technologies to enable your enterprise to develop and deploy machine learning models and other data science applications.
Watch this video to see the use case for implementing a Data Science and MLOps solution.
This video provides a visual method to learn the concepts and tasks in this documentation.
Challenges
You can solve the following challenges for your enterprise by implementing a Data Science and MLOps use case:
- Accessing high-quality data
- Organizations need to provide easy access to high quality, governed data for data science teams who use the data to build models.
- Operationalizing model building and deploying
- Organizations need to implement repeatable processes to quickly and efficiently build and deploy models to production environments.
- Monitoring and retraining models
- Organizations need to automate the monitoring and retraining of models based on production feedback.
Example: Golden Bank's challenges
Follow the story of Golden Bank as it implements a Data Science and MLOps process to expand its business by offering low-rate mortgage renewals for online applications. Data scientists at Golden Bank need to create a mortgage approval model
that avoids risk and treats all applicants fairly. They must also automate the model retraining to optimize model performance.
Process
To implement Data Science and MLOps for your enterprise, your organization can follow this process:
- Prepare and share the data
- Build and train models
- Deploy models
- Monitor deployed models
- Automate the AI lifecycle
The watsonx.ai Studio, watsonx.ai Runtime, Watson OpenScale, and IBM Knowledge Catalog services in Cloud Pak for Data as a Service provide the tools and processes that your organization needs to implement a Data Science and MLOps solution.
2. Build and train models
To get predictive insights based on your data, data scientists, business analysts, and machine learning engineers can build and train models. Data scientists use Cloud Pak for Data as a Service services to build the AI models, ensuring that the right algorithms and optimizations are used to make predictions that help to solve business problems.
What you can use | What you can do | Best to use when |
---|---|---|
AutoAI | Use AutoAI in watsonx.ai Studio to automatically select algorithms, engineer features, generate pipeline candidates, and train model pipeline candidates. Then, evaluate the ranked pipelines and save the best as models. Deploy the trained models to a space, or export the model training pipeline that you like from AutoAI into a notebook to refine it. |
You want an advanced and automated way to build a good set of training pipelines and models quickly. You want to be able to export the generated pipelines to refine them. |
Notebooks and scripts | Use notebooks and scripts in watsonx.ai Studio to write your own feature engineering model training and evaluation code in Python or R. Use training data sets that are available in the project, or connections to data sources such as
databases, data lakes, or object storage. Code with your favorite open source frameworks and libraries. |
You want to use Python or R coding skills to have full control over the code that is used to create, train, and evaluate the models. |
SPSS Modeler flows | Use SPSS Modeler flows in watsonx.ai Studio to create your own model training, evaluation, and scoring flows. Use training data sets that are available in the project, or connections to data sources such as databases, data lakes, or object storage. | You want a simple way to explore data and define model training, evaluation, and scoring flows. |
RStudio | Analyze data and build and test models by working with R in RStudio. | You want to use a development environment to work in R. |
Decision Optimization | Prepare data, import models, solve problems and compare scenarios, visualize data, find solutions, produce reports, and save models to deploy with watsonx.ai Runtime. | You need to evaluate millions of possibilities to find the best solution to a prescriptive analytics problem. |
Federated learning | Train a common model that uses distributed data. | You need to train a model without moving, combining, or sharing data that is distributed across multiple locations. |
Example: Golden Bank's model building and training
Data scientists at Golden Bank create a model, "Mortgage Approval Model" that avoids unanticipated risk and treats all applicants fairly. They want to track the history and performance of the model from the beginning, so they add a model use case to the "Mortgage Approval Catalog". They run a notebook to build the model and predict which applicants qualify for mortgages. The details of the model training are automatically captured as metadata in the model use case.
3. Deploy models
When operations team members deploy your AI models, the models become available for applications to use for scoring and predictions to help drive actions.
What you can use | What you can do | Best to use when |
---|---|---|
Spaces user interface | Use the Spaces UI to deploy models and other assets from projects to spaces. | You want to deploy models and view deployment information in a collaborative workspace. |
Example: Golden Bank's model deployment
The operations team members at Golden Bank promote the "Mortgage Approval Model" from the project to a deployment space and then creates an online model deployment.
4. Monitor deployed models
After models are deployed, it is important to monitor them to make sure that they are performing well. Data scientists must watch for model performance and data consistency issues.
What you can use | What you can do | Best to use when |
---|---|---|
Watson OpenScale | Monitor model fairness issues across multiple features. Monitor model performance and data consistency over time. Explain how the model arrived at certain predictions with weighted factors. Maintain and report on model governance and lifecycle across your organization. |
You have features that are protected or that might contribute to prediction fairness. You want to trace model performance and data consistencies over time. You want to know why the model gives certain predictions. |
Example: Golden Bank's model monitoring
Data scientists at Golden Bank use Watson OpenScale to monitor the deployed "Mortgage Approval Model" to ensure that it is accurate and treating all Golden Bank mortgage applicants fairly. They run a notebook to set up monitors for the model and then tweak the configuration by using the Watson OpenScale user interface. Using metrics from the Watson OpenScale quality monitor and fairness monitor, the data scientists determine how well the model predicts outcomes and if it produces any biased outcomes. They also get insights for how the model comes to decisions so that the decisions can be explained to the mortgage applicants.
5. Automate the AI lifecycle
Your team can automate and simplify the MLOps and AI lifecycle with Orchestration Pipelines.
What you can use | What you can do | Best to use when |
---|---|---|
Orchestration Pipelines | Use pipelines to create repeatable and scheduled flows that automate notebook, Data Refinery, and machine learning pipelines, from data ingestion to model training, testing, and deployment. | You want to automate some or all of the steps in an MLOps flow. |
Example: Golden Bank's automated ML lifecycle
The data scientists at Golden Bank can use pipelines to automate their complete Data Science and MLOps lifecycle and processes to simplify the model retraining process.
Tutorials for Data Science and MLOps
Tutorial | Description | Expertise for tutorial |
---|---|---|
Orchestrate an AI pipeline with model monitoring | Train a model, promote it to a deployment space, and deploy the model. | Run a notebook. |
Orchestrate an AI pipeline with data integration | Create an end-to-end pipeline that prepares data and trains a model. | Use the Orchestration Pipelines drag and drop interface to create a pipeline. |
Learn more
Parent topic: Use cases