Software Update Runbooks
5項目
Key Rotation Runbooks
4項目
Software Updates Overview
Overview of all AI Factory software update procedures.
Key Rotation Overview
This document is the entry point for all AI Factory key rotation procedures. It describes which key types exist, their rotation schedule, and which rotations require downtime. The runbooks for those procedures are linked below.
CAPI cluster health alerts
Responding to CAPI cluster health and provisioning alerts
Ceph node maintenance
Safe procedure for taking a Ceph storage node offline and returning it to service
Deleting orphaned tenant clusters
Deleting orphaned tenant clusters
Getting provisioning logs
Getting provisioning logs
Hedgehog switch credentials update
Updating switch user credentials in the Hedgehog fabricator spec
Hedgehog VM credentials update
Updating the core user password and SSH authorized keys on the Hedgehog control node VM
Management Cluster Subnet Migration
Overview
Personal data disposal procedure
Removing personal data for GDPR erasure request
Router Host Log Access
Getting logs from the router
Switch heartbeat alert
Responding to Switch Heartbeat Evaluation alerts caused by a hedgehog-agent crash loop
Tenant service termination — data disposal procedure
Removing all tenant data upon service termination.