Operator Overview
Operator Overview for AI Factory
Runbooks
15 items
Observability
3 items
Router Box Setup
Setting up the router box.
IaaS Console Configuration
Configuring JSON for the IaaS Console.
Network Control Node Setup
Provisioning the control node and installing SONiC.
Setup Ceph
Provisioning Ceph storage clusters via script on dedicated nodes
Bare-Metal Node Enrollment and Server Creation
Working with bare-metal nodes and servers.
Azure SSO Setup Guide
Generating Azure SSO credentials for the IaaS Console.
Google SSO Setup Guide
Generating Google SSO credentials.
Key Management Policy
Formal key management process for the AI Factory platform.
CAPI Management Cluster
Cluster API (CAPI) runs in the management cluster and is used by OpenStack Magnum to provision tenant Kubernetes clusters.
Deployment Scripts
Deploying GPU infrastructure clusters
GitHub Container Registry (GHCR) Authentication
Authenticating with ghcr.io.
IPMI (Intelligent Platform Management Interface) and SNMP (Simple Network Management Protocol) Observability Setup
Configuring hardware and network monitoring for bare-metal servers and switches.
Management TLS
Configuring Let's Encrypt certificates via Azure DNS for IaaS Console and OBS
Operator API Usage Guide
Using IaaS Operator APIs to manage users and tenants
Operator VPN: Adding operators to the WireGuard configuration
Granting a new user access to the operator WireGuard VPN.
Public IP Access
Setting up the public IP access
Router Services Configuration
Configuring the Router Box.
VPN Configuration as a Service Operator
Setting up the VPN as a Service Operator.
Cluster Add-ons
Monitoring health of Cluster Add-ons and performing maintenance tasks.
Cluster Add-ons API Reference
Managing add-ons on Kubernetes clusters with API endpoints.
OS Requirements
Listing the OS requirements.