Key Management Policy
Formal key management process for the AI Factory platform.
Scope
This policy applies to all cryptographic key material managed by the AI Factory service operator role:
| Key Type | Algorithm | Used For |
|---|---|---|
SSH keys (mido_infra) | Ed25519 | Infrastructure host access (all bare-metal nodes, management cluster VMs, Hedgehog fabric) |
| TLS certificates | RSA (CA-signed) | OpenStack API endpoints (HAProxy external/internal), RabbitMQ AMQP, ProxySQL, backend services |
| WireGuard VPN keys | Curve25519 | Operator and user VPN access per tenant |
| Ansible Vault passwords | AES-256 | Encryption of secrets at rest in the configuration repository |
For rotation schedules and individual runbooks, see the Key Rotation Overview.
Roles and Responsibilities
The service operator is the sole role authorized to perform key management operations on the AI Factory platform. This includes key creation, distribution, rotation, revocation, destruction, and maintaining the operations log.
No other role (tenant user, tenant admin, or read-only observer) has access to platform key material.
Key Lifecycle
Creation
Keys are generated using the standards below. Generation must occur on a trusted workstation or the designated bastion host, never on a shared or untrusted machine.
| Key Type | Generation Command | Standard |
|---|---|---|
| SSH key pair | ssh-keygen -t ed25519 -f <keyfile> -C "<comment>" -N "" | Ed25519, no passphrase (key protected by access controls on the workstation) |
| Ansible Vault password | openssl rand -base64 32 | tr -d '\n' > <passfile> | 32-byte random, base64-encoded |
| WireGuard private key | wg genkey | Curve25519 |
| TLS leaf certificates | Regenerated by ./platform-setup.sh --reconfigure against the environment CA | RSA, signed by the environment root CA |
For SSH and Ansible Vault, the generation steps are embedded in the rotation runbooks and follow the same standard when creating a key from scratch.
Distribution
Key material is distributed through the following channels, depending on type:
- SSH private keys are stored locally on the operator workstation (
~/.ssh/). The corresponding public key is registered in OpenStack (openstack keypair) and inauthorized_keyson all hosts. - TLS certificates are stored encrypted with the environment Ansible Vault password and must be managed exclusively by the service operator. Access requires the vault password from the credential store.
- WireGuard private keys are generated by each operator/user on their own workstation and never shared. Only the public key is registered — with the platform API for user keys, or in the server configuration for operator keys.
- Ansible Vault passwords are stored in the team credential management system. Access is restricted to service operators.
No key material is distributed over unencrypted channels (plaintext email, unencrypted chat).
Storage
| Key Type | At-rest storage | Access control |
|---|---|---|
| SSH private keys | Operator workstation ~/.ssh/, mode 600 | Local OS user access only |
TLS cert files (*.pem, *.key, *.crt) | Ansible Vault-encrypted in configuration repo | Vault password required to decrypt |
| Ansible Vault passwords | Team credential management system | Service operator role only |
| WireGuard private keys | Operator/user workstation; never stored on platform | Workstation owner only |
Rotation
Planned rotation follows the schedule and runbooks in Key Rotation Overview:
- SSH keys. Every 6 months: Rotate SSH Keys
- TLS certificates. Every 12 months: Rotate TLS Certificates
- WireGuard VPN keys. Every 12 months: Rotate WireGuard VPN Keys
- Ansible Vault passwords. Every 12 months: Rotate Ansible Vault Passwords
Rotation must complete before the key expires. For TLS certificates, rotation is recommended within 60 days of the notAfter date.
Revocation
Unscheduled revocation is required when a key is known or suspected to be compromised (lost workstation, leaked repository, unauthorized access).
The revocation procedure is the same as the rotation runbook for that key type, executed immediately and out of the normal schedule:
- Identify the compromised key type and scope.
- Execute the corresponding rotation runbook immediately.
- Audit access logs for any unauthorized use between the suspected compromise time and revocation.
- Record the incident in the configuration repository commit message (see Audit Trail).
For SSH key compromise affecting infrastructure hosts, also review shell history and system logs (/var/log/auth.log) on affected hosts before rotation completes.
Destruction
After a key is rotated, the old key material must be destroyed:
| Key Type | Destruction step |
|---|---|
| SSH private key | Delete the old private key file: rm ~/.ssh/old_key.pem. The old public key is removed from all hosts as part of the SSH rotation runbook (Step 4). |
| Ansible Vault password | Remove temporary password files from disk (rm -f /tmp/old_vault_pass.txt). This is Step 8 of the Ansible Vault rotation runbook. The old vault password is no longer valid once all files are re-keyed. |
| TLS leaf certificates | Deleted as part of Step 2 of the TLS rotation runbook. Do not delete the CA or its private key. |
| WireGuard private key | Remove old private key from workstation after confirming the new key establishes a handshake. |
Approval Workflow
All changes to key material must be reviewed and approved by at least one other service operator before being applied.
Emergency revocations that cannot wait for peer review may be applied immediately, but must be followed by a review within 24 hours.
All key management operations must be announced to the service operator team and documented in the Audit Trail.
Audit Trail
Every key management operation must be recorded in the team's operations log. Each entry must include:
- The key type rotated or revoked
- The environment affected
- The date the operation was performed
- The operator who performed it
- The reason (scheduled rotation or incident reference)