メインコンテンツまでスキップ

AI Factory v25.1.0

· 約3分
Midokura Team
Midokura Team

Version 25.1.0 of AI Factory is now available.

Overview

These release notes describe the revised steps, configuration details, and changes for provisioning and managing an AI Factory cluster under the new release.

Long-term observability storage via Ceph

AI Factory now supports Ceph-backed storage for the observability stack. The storage backend is selected via the phoenix_observability_storage inventory variable, which accepts two values: "local" (node-local disk) or "ceph" (Ceph-backed persistent storage). While "local" is suitable for short retention periods, "ceph" is recommended when retaining logs or metrics for longer than one week. When using Ceph, the retention period is independently configurable for both Loki (logs) and Prometheus (metrics), as well as the amount of storage reserved for each.

End-to-end DNS resolution

AI Factory now provides fully integrated DNS resolution across all three internal network levels — physical underlay, OpenStack overlay, and IaaS Kubernetes services — eliminating the need for manual /etc/hosts entries. Every host, VM, and container within the IaaS network can resolve hostnames across all levels (e.g., <host>.phoenix.bcn, <vm>.ost.phoenix.bcn, <service>.iaas.phoenix.bcn) as well as public Internet, with DNS forwarding handled transparently between levels. The IaaS Kubernetes CoreDNS is now exposed as a LoadBalancer Service and extended with the k8s_gateway plugin to resolve LoadBalancer-type service hostnames, VPN clients are automatically pointed to the correct DNS servers through the VPN configuration, and no user-side DNS configuration is required.

Internal TLS for Openstack Services

AI Factory now encrypts internal communication between OpenStack services and the surrounding platform components using TLS. All service-to-service traffic within the control plane is authenticated and protected in transit, eliminating cleartext exchanges on the internal network.

Migration steps

Remove legacy CoreDNS ConfigMap keys

After running the provision-management-cluster playbook, remove the legacy coredns-custom ConfigMap keys left by the previous iaas-console role. These keys define a duplicate DNS zone that prevents CoreDNS from starting:

kubectl patch configmap coredns-custom -n kube-system \
--type=json \
-p '[{"op":"remove","path":"/data/custom.server"},{"op":"remove","path":"/data/custom.override"}]'

This is a one-time migration step. If the keys do not exist, the command will return an error and can be safely ignored.

Operator overview

Postgres credentials

The iaas-api Helm chart generates random PostgreSQL credentials on first install. These survive normal Helm upgrades but are permanently lost if the cluster is destroyed or the Secret is deleted. Without the original credentials, existing S3 backups cannot be decrypted.

To preserve credentials across cluster recreations, extract them before destroying the cluster and add them vault-encrypted to inventory.yml:

# Run from inside the deployment container (./scripts/platform-setup.sh --shell)
kubectl get secret iaas-api-postgresql -n iaas-console -o jsonpath='{.data.password}' | base64 -d
kubectl get secret iaas-api-postgresql -n iaas-console -o jsonpath='{.data.postgres-password}' | base64 -d
kubectl get secret iaas-api-postgresql -n iaas-console -o jsonpath='{.data.replication-password}' | base64 -d
kubectl get secret iaas-api-postgresql -n iaas-console -o jsonpath='{.data.backup-key}' | base64 -d
# inventory.yml
all:
vars:
iaas_console:
postgresql_credentials:
password: !vault |
$ANSIBLE_VAULT;1.1;AES256
...
postgres_password: !vault |
$ANSIBLE_VAULT;1.1;AES256
...
replication_password: !vault |
$ANSIBLE_VAULT;1.1;AES256
...
backup_key: !vault |
$ANSIBLE_VAULT;1.1;AES256
...

Operator reference

The operator overview for this release of AI Factory can be found in the /docs section.

Please contact support@midokura.com for more information.