Configure Horizontal Pod Autoscaler
This document explains how to enable and configure Horizontal Pod Autoscaler (HPA) for Digital.ai Release running on Kubernetes.
What Is Horizontal Pod Autoscaling?
Horizontal Pod Autoscaling (HPA) is a Kubernetes feature that automatically adjusts the number of running pods in a deployment based on observed resource utilization, such as CPU or memory usage.
By scaling pods up during periods of high demand and scaling them down when demand decreases, HPA helps maintain application performance while optimizing resource usage.
How Horizontal Pod Autoscaling Works in Digital.ai Release
Starting with Release 25.3.0, Digital.ai Release supports Kubernetes-native Horizontal Pod Autoscaling for Release pods.
Autoscaling is configured as part of the Release deployment specification and uses standard Kubernetes mechanisms. Digital.ai Release does not introduce custom autoscaling logic or controllers; instead, it integrates directly with Kubernetes HPA.
Release pods are scaled based on the resource metrics and thresholds you configure, allowing the system to adapt dynamically to workload changes while maintaining predictable and controlled scaling behavior.
Prerequisites
Before enabling Horizontal Pod Autoscaling, ensure the following:
-
Kubernetes cluster - A running Kubernetes cluster. Horizontal Pod Autoscaling is a built-in Kubernetes API resource and does not require additional installation.
-
Metrics server - The
metrics.k8s.ioAPI must be available, typically provided by the metrics-server add-on. -
Resource requests and limits - Release pods must define CPU and memory requests and limits. HPA calculates utilization based on the requested values. For more information, see Configure Resource Sizing.
Autoscaling Configuration
Autoscaling is configured by setting values under the spec.autoscaling key. These settings control whether autoscaling is enabled, the allowed replica range, and the resource metrics used to trigger scaling events.
Example Configuration
The following example enables autoscaling and configures both memory- and CPU-based scaling:
spec:
autoscaling:
enabled: true
minReplicas: 3
maxReplicas: 5
metrics:
- type: Resource
resource:
name: memory
target:
type: Utilization
averageUtilization: 75
- type: Resource
resource:
name: cpu
target:
type: Utilization
averageUtilization: 220
With the example configuration:
- A steady baseline of 3 pods is always maintained
- The deployment can scale up to 5 pods
- Sustained memory pressure is the most likely trigger for scaling
- CPU-based scaling occurs only under sustained or extreme CPU usage
This configuration maintains a fixed baseline of pods while allowing the system to scale under sustained resource pressure. For more information about the configuration, see How Autoscaling Configuration Affects Release Pods.
Applying Changes to the Release Custom Resource
After you configure the autoscaling settings in your Release custom resource, you must apply the changes to your Kubernetes cluster.
For detailed instructions on how to apply configuration changes, see Update Parameters in the Custom Resource File.
How Autoscaling Configuration Affects Release Pods
This section explains how each autoscaling setting affects scaling behavior for Digital.ai Release pods.
Enabling Autoscaling
enabled: true
Turns on Kubernetes Horizontal Pod Autoscaler behavior for the Release component.
When autoscaling is disabled, the deployment runs with a fixed number of replicas and ignores all autoscaling-related settings.
Minimum and Maximum Replicas
minReplicas: 3
Ensures that at least three pods are always running. The deployment will not scale below this value, even during periods of low load.
maxReplicas: 5
Limits the maximum number of pods to five, even if resource utilization exceeds the configured thresholds.
Together, these values define the allowed scaling range for the Release pods.
Resource Metrics and Scaling Logic
The metrics section defines which resource signals are used to trigger scaling decisions.
When multiple metrics are configured:
- Horizontal Pod Autoscaler calculates a desired replica count for each metric independently
- The highest recommended replica count is applied
This ensures that scaling satisfies the most constrained resource.
Memory-Based Scaling
averageUtilization: 75
This setting instructs HPA to maintain average memory usage at approximately 75% of each pod’s requested memory.
When sustained memory utilization exceeds this threshold, HPA increases the number of pods to distribute the load more evenly. In this configuration, memory is the primary driver for scaling events.
CPU-Based Scaling
averageUtilization: 220
This setting causes HPA to scale based on CPU usage only when average CPU utilization exceeds 220% of the requested CPU.
Using a higher CPU threshold delays scaling on short CPU spikes, making CPU a secondary scaling signal compared to memory.
Scaling Behavior and Outcome
Horizontal Pod Autoscaling follows standard Kubernetes scaling rules:
- Scale up occurs when one or more metrics consistently exceed their target utilization values.
- Scale down occurs only when all metrics remain below their targets and after HPA stabilization windows are satisfied.