メインコンテンツまでスキップ

Phoenix v1.5

· 約5分
Alexander Fandos
Software Engineer @ Midokura

We are pleased to announce the v1.5 release of Phoenix.

Overview

The updated Operator Reference sheet and release notes are included with this message. They describe the revised steps, configuration details, and changes for provisioning and managing a Phoenix cluster under the new release.

Have a great weekend!

Features

Kubernetes Cluster Management Enhancements

Download Cluster Configuration Files

  • Users can now download kubeconfig files directly from the UI for active Kubernetes clusters
  • This enables immediate access to clusters using standard Kubernetes tools like kubectl
  • The download button appears in the cluster details view for ready clusters

Improved Cluster Information Display

  • Enhanced cluster details page with more comprehensive information:
    • Cluster Health Status: Visual indicators showing whether clusters are healthy, unhealthy, or in an unknown state
    • Detailed Cluster Labels: Displays Kubernetes version, container runtime, network plugin, and cloud provider information
    • Node Health Details: Expandable view showing health status for individual cluster nodes
    • Updated Timestamps: Shows both creation and last update dates
  • Clusters table now shows master count instead of creation date for better at-a-glance information

Configurable Cluster Settings

  • System administrators can now configure cluster templates and settings through configuration files
  • No longer requires code changes to adjust cluster creation parameters
  • More flexible deployment options for different environments

Security Improvements

SSH Key Validation

  • The system now validates SSH public keys when users add them
  • Invalid or malformed keys are rejected with clear error messages
  • Prevents issues that could occur later when using keys for server access

Infrastructure Automation

Enhanced Compute Infrastructure

  • Support for compute-only nodes, enabling more flexible infrastructure scaling
  • Better integration between IaaS Console and Kubernetes cluster management (Magnum)

Improved Deployment Experience

  • Deployment scripts now work from any directory, making setup easier
  • More consistent inventory management across the platform

Bug Fixes

Application Fixes

  • Fixed an issue with OpenStack client operations that could affect multi-tenant environments
  • Improved reliability of cluster and infrastructure operations

Infrastructure Reliability Fixes

  • Memory Leak Fix: Resolved memory leak in OpenStack networking that required daily restarts
  • Boot Configuration: Fixed GRUB update issues that could prevent servers from booting correctly
  • Hostname Resolution: Fixed hostname resolution problems that were affecting various services
  • Virtual Machine Issues: Resolved KVM VM hostname resolution problems
  • System Configuration: Fixed formatting issues in system configuration files

Configuration Fixes

  • Fixed image download and storage configuration issues
  • Corrected Magnum network driver configuration
  • Fixed CoreDNS Kubernetes external service configuration

Performance & Reliability

  • Faster application startup - Grafana integration no longer blocks the application from starting
  • More reliable cluster operations with improved timeout handling
  • Better error handling and retry logic for infrastructure operations

Operator reference

This is the reference sheet for Phoenix v1.5, an end-to-end solution to operate private, multi-tenant AI factories. Operators will find below an overview of the materials, infrastructure, and other requirements, and an entry point to the procedure to provision and configure the system.

Please contact support@midokura.com for more information.

System requirements

Note: documentation files referenced here are provided in a downloadable artefact included in the environment setup section.

  • Before proceeding, operators are expected to ensure that the underlying infrastructure meets the system requirements listed below.
  • Operating system requirements for the OpenStack control nodes are available in the documentation file ./service-operator/OS_REQUIREMENTS.md
  • Operators are expected to set up their hardware according to our official Blueprint, specifically with regard to network configuration, port and interface assignment.
    • Base Operating System for OSt controllers should be ubuntu-24.04
  • Storage. Operators are expected to provide a Ceph cluster, integrated in the infrastructure as defined in the blueprint. See more details in the Environment setup.
  • Set up a new Google Application that will be used as an SSO provider for the IaaS service. To follow this process, consult the ./service-operator/GOOGLE_SSO_SETUP.md file in the documentation bundle described below.
  • Set up credentials for the private registry at ghcr.io/midokura. We will provide you with this token via secure means, and it will be required during the control plane installation process (more info ./service-operator/GHCR_AUTHENTICATION.md).

Overview

The sections below provide references to materials required to proceed with the provisioning process, which takes place from the Bastion node shown in the blueprint. On a high level, the process is based on:

  • An installer of the network fabric controller.
  • A bundle of Ansible playbooks that will install and configure all components in the control plane.

Environment setup

To install the Phoenix cluster, the Operator will work from the bastion node reflected in the blueprint. The materials below must be available in the node before proceeding with the installation.

  1. Create a new directory ./phoenix. This will serve to store artefacts and playbooks. All commands and paths in this document are relative to this directory.
  2. Download and extract the Documentation bundle. We will refer to documentation files from different sections of this document.
  3. Download the Network controller installer ISO and the network fabric configuration
    • hedgehog-installer.iso
    • hedgehog-fabric-configuration.yaml

Provisioning procedure

Network fabric setup

  • To install the network fabric controller, follow the instructions in ./service-operator/NETWORK_CONTROL_NODE_SETUP.md

Control plane installation

  • Prepare the Ceph cluster by following the steps explained in the documentation file ./service-operator/CEPH_SETUP.md.
  • Download and extract Ansible playbooks.
  • Download inventory.example.yml as the base to input the configuration specific to your cluster.
  • Execute them following the instructions in ./service-operator/DEPLOYMENT.md

IaaS Console - Tenant and User configuration

To create additional admin users, register tenants and tenant users, please refer to the instructions in ./service-operator/IAAS_CONSOLE_CONFIGURATION.md

Baremetal Installation

To install a baremetal node, please refer to the instructions in the documentation file: ./service-operator/INSTALL_BAREMETAL_NODE.md