メインコンテンツまでスキップ

Phoenix v1.4

· 約5分
Sergi Miralles
Software Engineer @ Midokura

We are pleased to announce the v1.4 release of Phoenix.

Overview

This release introduces significant infrastructure automation improvements, enhanced observability with Grafana integration, expanded HedgeHog network management capabilities, and robust Kubernetes cluster provisioning through Magnum. The release focuses on improving deployment reliability, monitoring capabilities, and network infrastructure automation.

Features

Observability & Monitoring

  • Grafana Integration (iaas-api): Added native Grafana configuration to iaas-api for seamless monitoring dashboard connectivity
  • Enhanced Observability Deployment: Increased Helm chart deployment timeout for more reliable observability stack installations
  • Improved Monitoring Configuration: Added comprehensive Grafana variables to inventory management

Network Infrastructure Automation

  • HedgeHog VM Provisioning: Automated download, creation, and provisioning of HedgeHog VMs with ISO installer support and installation log monitoring
  • HedgeHog Manifest Deployment: Automated deployment of HedgeHog network manifests with VPC peering support and boot NIC MAC labeling for IaaS Console integration

Container Orchestration

  • Magnum Kubernetes Templates: Added automatic Kubernetes cluster template provisioning with Kubernetes v1.28.9, OpenStack integrations (Cinder CSI, Keystone, Octavia), and production-ready configurations
  • Enhanced Magnum Support: Enabled Magnum service in both development and QA environments with proper cluster user trust configuration

Image & Asset Management

  • Multi-format Compression Support: Added support for gz, bz2, and xz compressed images using community.general.decompress module
  • Improved Image Handling: Enhanced image download and decompression with better error handling for both compressed and uncompressed images
  • Fedora CoreOS 38 Support: Added Fedora CoreOS 38 image support with custom image properties

Development & Operations

  • Script Reorganization: Renamed main.sh to platform-setup.sh for clearer script identification and easier reference
  • Enhanced CLI Experience: Added Phoenix banner and welcome banner for improved user experience
  • Improved Error Reporting: Enhanced Makefile to print paths of missing playbooks for better debugging

Bug Fixes

  • Fixed OpenStack pod resource limits: Increased default PID limit for OpenStack pods to prevent neutron_dhcp_agent exhaustion
  • Fixed image processing: Improved handling of uncompressed images with different filenames and enhanced compressed file validation
  • Fixed Kolla Ansible dependencies: Updated to latest 2025.1 release version of python-ironicclient (5.10.1)
  • Fixed CI/CD workflows: Improved credential handling, proper yq installation, and enhanced workflow path management
  • Fixed inventory management: Corrected bastion inventory location and improved ansible configuration handling
  • Fixed VPC networking: Avoided overlapping subnets between operator and tenant VPCs Fixed deployment scripts: Enhanced vault password management and environment variable handling

This release contains 95+ commits focused on infrastructure automation, observability enhancements, and deployment reliability improvements. The changes span multiple components including gpu-infrastructure, iaas-console integration, HedgeHog network management, and OpenStack service expansion.

Merry Christmas! 🎅

Operator reference

This is the reference sheet for Phoenix v1.4, an end-to-end solution to operate private, multi-tenant AI factories. Operators will find below an overview of the materials, infrastructure, and other requirements, and an entry point to the procedure to provision and configure the system.

Please contact support@midokura.com for more information.

System requirements

Note: documentation files referenced here are provided in a downloadable artefact included in the environment setup section.

  • Before proceeding, operators are expected to ensure that the underlying infrastructure meets the system requirements listed below.
  • Operating system requirements for the OpenStack control nodes are available in the documentation file ./service-operator/OS_REQUIREMENTS.md
  • Operators are expected to set up their hardware according to our official Blueprint, specifically with regard to network configuration, port and interface assignment. -- Base Operating System for OSt controllers should be ubuntu-24.04
  • Storage. Operators are expected to provide a Ceph cluster, integrated in the infrastructure as defined in the blueprint. See more details in the Environment setup.
  • Set up a new Google Application that will be used as an SSO provider for the IaaS service. To follow this process, consult the ./service-operator/GOOGLE_SSO_SETUP.md file in the documentation bundle described below.
  • Set up credentials for the private registry at ghcr.io/midokura. We will provide you with this token via secure means, and it will be required during the control plane installation process (more info ./service-operator/GHCR_AUTHENTICATION.md).

Overview

The sections below provide references to materials required to proceed with the provisioning process, which takes place from the Bastion node shown in the blueprint. On a high level, the process is based on:

  • An installer of the network fabric controller.
  • A bundle of Ansible playbooks that will install and configure all components in the control plane.

Environment setup

To install the Phoenix cluster, the Operator will work from the bastion node reflected in the blueprint. The materials below must be available in the node before proceeding with the installation.

  1. Create a new directory ./phoenix. This will serve to store artefacts and playbooks. All commands and paths in this document are relative to this directory.
  2. Download and extract the Documentation bundle. We will refer to documentation files from different sections of this document.
  3. Download the Network controller installer ISO and the network fabric configuration a. hedgehog-installer.iso b. hedgehog-fabric-configuration.yaml

Provisioning procedure

Network fabric setup

  • To install the network fabric controller, follow the instructions in ./service-operator/NETWORK_CONTROL_NODE_SETUP.md

Control plane installation

  • Prepare the Ceph cluster by following the steps explained in the documentation file ./service-operator/CEPH_SETUP.md.
  • Download and extract Ansible playbooks.
  • Download inventory.example.yml as the base to input the configuration specific to your cluster.
  • Execute them following the instructions in ./service-operator/DEPLOYMENT.md

IaaS Console - Tenant and User configuration

To create additional admin users, register tenants and tenant users, please refer to the instructions in ./service-operator/IAAS_CONSOLE_CONFIGURATION.md

Baremetal Installation

To install a baremetal node, please refer to the instructions in the documentation file: ./service-operator/INSTALL_BAREMETAL_NODE.md