Version: Release 25.3

Configure Horizontal Pod Autoscaler

This document explains how to enable and configure Horizontal Pod Autoscaler (HPA) for Digital.ai Release running on Kubernetes.

What Is Horizontal Pod Autoscaling?

Horizontal Pod Autoscaling (HPA) is a Kubernetes feature that automatically adjusts the number of running pods in a deployment based on observed resource utilization, such as CPU or memory usage.

By scaling pods up during periods of high demand and scaling them down when demand decreases, HPA helps maintain application performance while optimizing resource usage.

How Horizontal Pod Autoscaling Works in Digital.ai Release

Starting with Release 25.3.0, Digital.ai Release supports Kubernetes-native Horizontal Pod Autoscaling for Release pods.

Autoscaling is configured as part of the Release deployment specification and uses standard Kubernetes mechanisms. Digital.ai Release does not introduce custom autoscaling logic or controllers; instead, it integrates directly with Kubernetes HPA.

Release pods are scaled based on the resource metrics and thresholds you configure, allowing the system to adapt dynamically to workload changes while maintaining predictable and controlled scaling behavior.

Prerequisites

Before enabling Horizontal Pod Autoscaling, ensure the following:

Kubernetes cluster - A running Kubernetes cluster. Horizontal Pod Autoscaling is a built-in Kubernetes API resource and does not require additional installation.
Metrics server - The metrics.k8s.io API must be available, typically provided by the metrics-server add-on.
Resource requests and limits - Release pods must define CPU and memory requests and limits. HPA calculates utilization based on the requested values. For more information, see Configure Resource Sizing.

Autoscaling Configuration

Autoscaling is configured by setting values under the spec.autoscaling key. These settings control whether autoscaling is enabled, the allowed replica range, and the resource metrics used to trigger scaling events.

Example Configuration

The following example enables autoscaling and configures both memory- and CPU-based scaling:

spec:
  autoscaling:
    enabled: true
    minReplicas: 3
    maxReplicas: 5
    metrics:
      - type: Resource
        resource:
          name: memory
          target:
            type: Utilization
            averageUtilization: 75
      - type: Resource
        resource:
          name: cpu
          target:
            type: Utilization
            averageUtilization: 220

With the example configuration:

A steady baseline of 3 pods is always maintained
The deployment can scale up to 5 pods
Sustained memory pressure is the most likely trigger for scaling
CPU-based scaling occurs only under sustained or extreme CPU usage

This configuration maintains a fixed baseline of pods while allowing the system to scale under sustained resource pressure. For more information about the configuration, see How Autoscaling Configuration Affects Release Pods.

Applying Changes to the Release Custom Resource

After you configure the autoscaling settings in your Release custom resource, you must apply the changes to your Kubernetes cluster.

For detailed instructions on how to apply configuration changes, see Update Parameters in the Custom Resource File.

How Autoscaling Configuration Affects Release Pods

This section explains how each autoscaling setting affects scaling behavior for Digital.ai Release pods.

Enabling Autoscaling

enabled: true
Turns on Kubernetes Horizontal Pod Autoscaler behavior for the Release component.

When autoscaling is disabled, the deployment runs with a fixed number of replicas and ignores all autoscaling-related settings.

Minimum and Maximum Replicas

minReplicas: 3
Ensures that at least three pods are always running. The deployment will not scale below this value, even during periods of low load.

maxReplicas: 5
Limits the maximum number of pods to five, even if resource utilization exceeds the configured thresholds.

Together, these values define the allowed scaling range for the Release pods.

Resource Metrics and Scaling Logic

The metrics section defines which resource signals are used to trigger scaling decisions.

When multiple metrics are configured:

Horizontal Pod Autoscaler calculates a desired replica count for each metric independently
The highest recommended replica count is applied

This ensures that scaling satisfies the most constrained resource.

Memory-Based Scaling

averageUtilization: 75

This setting instructs HPA to maintain average memory usage at approximately 75% of each pod’s requested memory.

When sustained memory utilization exceeds this threshold, HPA increases the number of pods to distribute the load more evenly. In this configuration, memory is the primary driver for scaling events.

CPU-Based Scaling

averageUtilization: 220

This setting causes HPA to scale based on CPU usage only when average CPU utilization exceeds 220% of the requested CPU.

Using a higher CPU threshold delays scaling on short CPU spikes, making CPU a secondary scaling signal compared to memory.

Scaling Behavior and Outcome

Horizontal Pod Autoscaling follows standard Kubernetes scaling rules:

Scale up occurs when one or more metrics consistently exceed their target utilization values.
Scale down occurs only when all metrics remain below their targets and after HPA stabilization windows are satisfied.

What Is Horizontal Pod Autoscaling?​

How Horizontal Pod Autoscaling Works in Digital.ai Release​

Prerequisites​

Autoscaling Configuration​

Example Configuration​

Applying Changes to the Release Custom Resource​

How Autoscaling Configuration Affects Release Pods​

Enabling Autoscaling​

Minimum and Maximum Replicas​

Resource Metrics and Scaling Logic​

Memory-Based Scaling​

CPU-Based Scaling​

Scaling Behavior and Outcome​

Related Topics​