Skip to main content
Version: v25.1.0

Key Management Policy

Formal key management process for the AI Factory platform.

Scope

This policy applies to all cryptographic key material managed by the AI Factory service operator role:

Key TypeAlgorithmUsed For
SSH keys (mido_infra)Ed25519Infrastructure host access (all bare-metal nodes, management cluster VMs, Hedgehog fabric)
TLS certificatesRSA (CA-signed)OpenStack API endpoints (HAProxy external/internal), RabbitMQ AMQP, ProxySQL, backend services
WireGuard VPN keysCurve25519Operator and user VPN access per tenant
Ansible Vault passwordsAES-256Encryption of secrets at rest in the configuration repository

For rotation schedules and individual runbooks, see the Key Rotation Overview.

Roles and Responsibilities

The service operator is the sole role authorized to perform key management operations on the AI Factory platform. This includes key creation, distribution, rotation, revocation, destruction, and maintaining the operations log.

No other role (tenant user, tenant admin, or read-only observer) has access to platform key material.

Key Lifecycle

Creation

Keys are generated using the standards below. Generation must occur on a trusted workstation or the designated bastion host, never on a shared or untrusted machine.

Key TypeGeneration CommandStandard
SSH key pairssh-keygen -t ed25519 -f <keyfile> -C "<comment>" -N ""Ed25519, no passphrase (key protected by access controls on the workstation)
Ansible Vault passwordopenssl rand -base64 32 | tr -d '\n' > <passfile>32-byte random, base64-encoded
WireGuard private keywg genkeyCurve25519
TLS leaf certificatesRegenerated by ./platform-setup.sh --reconfigure against the environment CARSA, signed by the environment root CA

For SSH and Ansible Vault, the generation steps are embedded in the rotation runbooks and follow the same standard when creating a key from scratch.

Distribution

Key material is distributed through the following channels, depending on type:

  • SSH private keys are stored locally on the operator workstation (~/.ssh/). The corresponding public key is registered in OpenStack (openstack keypair) and in authorized_keys on all hosts.
  • TLS certificates are stored encrypted with the environment Ansible Vault password and must be managed exclusively by the service operator. Access requires the vault password from the credential store.
  • WireGuard private keys are generated by each operator/user on their own workstation and never shared. Only the public key is registered — with the platform API for user keys, or in the server configuration for operator keys.
  • Ansible Vault passwords are stored in the team credential management system. Access is restricted to service operators.

No key material is distributed over unencrypted channels (plaintext email, unencrypted chat).

Storage

Key TypeAt-rest storageAccess control
SSH private keysOperator workstation ~/.ssh/, mode 600Local OS user access only
TLS cert files (*.pem, *.key, *.crt)Ansible Vault-encrypted in configuration repoVault password required to decrypt
Ansible Vault passwordsTeam credential management systemService operator role only
WireGuard private keysOperator/user workstation; never stored on platformWorkstation owner only

Rotation

Planned rotation follows the schedule and runbooks in Key Rotation Overview:

Rotation must complete before the key expires. For TLS certificates, rotation is recommended within 60 days of the notAfter date.

Revocation

Unscheduled revocation is required when a key is known or suspected to be compromised (lost workstation, leaked repository, unauthorized access).

The revocation procedure is the same as the rotation runbook for that key type, executed immediately and out of the normal schedule:

  1. Identify the compromised key type and scope.
  2. Execute the corresponding rotation runbook immediately.
  3. Audit access logs for any unauthorized use between the suspected compromise time and revocation.
  4. Record the incident in the configuration repository commit message (see Audit Trail).

For SSH key compromise affecting infrastructure hosts, also review shell history and system logs (/var/log/auth.log) on affected hosts before rotation completes.

Destruction

After a key is rotated, the old key material must be destroyed:

Key TypeDestruction step
SSH private keyDelete the old private key file: rm ~/.ssh/old_key.pem. The old public key is removed from all hosts as part of the SSH rotation runbook (Step 4).
Ansible Vault passwordRemove temporary password files from disk (rm -f /tmp/old_vault_pass.txt). This is Step 8 of the Ansible Vault rotation runbook. The old vault password is no longer valid once all files are re-keyed.
TLS leaf certificatesDeleted as part of Step 2 of the TLS rotation runbook. Do not delete the CA or its private key.
WireGuard private keyRemove old private key from workstation after confirming the new key establishes a handshake.

Approval Workflow

All changes to key material must be reviewed and approved by at least one other service operator before being applied.

Emergency revocations that cannot wait for peer review may be applied immediately, but must be followed by a review within 24 hours.

All key management operations must be announced to the service operator team and documented in the Audit Trail.

Audit Trail

Every key management operation must be recorded in the team's operations log. Each entry must include:

  • The key type rotated or revoked
  • The environment affected
  • The date the operation was performed
  • The operator who performed it
  • The reason (scheduled rotation or incident reference)