Course Description
This course takes you from ad-hoc Jupyter notebooks to production-style MLOps workflows tailored for AIOps use cases such as anomaly detection and incident prediction. You will learn how to make models reproducible, trackable, and deployable at scale using MLflow for experiment tracking and model packaging, and Kubeflow Pipelines for end-to-end orchestration on Kubernetes. Through hands-on labs, you will convert exploratory code into parameterized scripts, track experiments with MLflow and MinIO, deploy inference services to Kubernetes, and automate train → register → validate → deploy pipelines for AIOps workloads.
Prerequisites
- Python Basics
- Kubernetes Basics
- Machine Learning Fundamentals
Course Highlights
-
Why MLOps Is Critical for AIOps
This module explains the shift from exploratory notebook-based experimentation to production-grade ML workflows for AIOps, showing how MLOps practices improve reliability, observability, and governance for models used in incident management and anomaly detection.
- Lab 1.1 – From Notebook to Production
- Convert an anomaly detection Jupyter notebook into a
train.pyscript. - Introduce CLI arguments, random seeds, and reproducibility best practices.
- Run multiple configurations manually to prepare for later automation.
- Convert an anomaly detection Jupyter notebook into a
- Lab 1.1 – From Notebook to Production
-
Experiment Tracking & Model Packaging with MLflow
This module introduces MLflow Tracking for capturing parameters, metrics, and artifacts across experiments, and shows how to configure MinIO as an S3-compatible artifact store integrated with an MLflow Tracking Server for reproducible AIOps workflows.
- Lab 2.1 – Setting Up MLflow & MinIO
- Deploy MinIO and MLflow Tracking Server.
- Verify UI access and connectivity between MLflow and MinIO as an artifact backend.
- Lab 2.2 – Logging Parameters, Metrics, and Artifacts
- Instrument
train.pywith MLflow Tracking API calls. - Log parameters, metrics, and model artifacts, then explore runs in the MLflow UI.
- Instrument
- Lab 2.3 – Packaging Models for Reproducibility
- Create
MLprojectandconda.yamlfiles for reproducible runs. - Define entry points and re-run experiments using the MLflow CLI.
- Create
- Lab 2.1 – Setting Up MLflow & MinIO
-
Deploying & Serving AIOps Models
This module focuses on moving from trained models to live inference endpoints suitable for real-time anomaly detection, covering different ways to serve MLflow models and expose them via REST APIs for consumption by AIOps systems.
- Lab 3.1 – Serving Models with MLflow
- Serve the trained model locally from MLflow runs.
- Register the model in the MLflow Model Registry and serve it from there.
- Test predictions using
curland Pythonrequests.
- Lab 3.2 – Containerizing and Deploying to Kubernetes
- Package the serving application into a Docker image.
- Deploy the model-serving service to Kubernetes using
DeploymentandServiceresources. - Verify access to predictions through a REST API endpoint.
- Lab 3.1 – Serving Models with MLflow
-
Orchestrating AIOps Pipelines with Kubeflow
This module teaches how to automate the full ML lifecycle — from training to validation to deployment — using Kubeflow Pipelines, and how to connect Kubeflow components with MLflow to build traceable, production-style AIOps pipelines.
- Lab 4.1 – Exploring Kubeflow Pipelines
- Access the Kubeflow UI and inspect pre-built sample pipelines.
- Run sample pipelines and observe execution graphs and artifacts.
- Lab 4.2 – Building the Training & Registration Components
- Create a
traincomponent that uses the existingtrain.pyscript. - Create a
registercomponent that pushes the trained model to the MLflow Model Registry. - Run a two-step Kubeflow pipeline to train and log a model to MLflow.
- Create a
- Lab 4.3 – Building the Full Train → Validate → Deploy Pipeline
- Add a
validatecomponent that checks whether the model’s anomaly rate is within acceptable limits. - Add a
deploycomponent that consumes the trained model and deploys it to production. - Compile, upload, and trigger a four-step Kubeflow pipeline to automate model training and serving end to end.
- Add a
- Lab 4.1 – Exploring Kubeflow Pipelines
