Horizontal Pod Autoscaler (HPA)
Introduction
HPA (Horizontal Pod Autoscaling) refers to the horizontal automatic scaling of Kubernetes Pods, which is also an API object in Kubernetes. With this scaling component, Kubernetes clusters can use monitoring indicators (such as CPU usage) to automatically scale up or down the number of Pods in services. When business needs increase, HPA will automatically increase the number of Pods in the service, improving system stability. When business needs decline, HPA will automatically reduce the number of Pods in the service, reducing the request volume (Request) for cluster resources. Combined with Cluster Autoscaler, it can also achieve automatic scaling of the cluster scale, saving IT costs.
It should be noted that the default HPA only supports detection of scaling based on the threshold of CPU and memory, but it can also call prometheus via custom metric api to implement custom metrics, and implement elastic scaling based on more flexible monitoring indicators. However, HPA cannot be used to scale controllers that cannot be scaled, such as DaemonSet.
Working Principle
HPA is designed as a Controller in K8S, and it can be simply created using the kubectl autoscale command. The HPA Controller queries the usage rate of specified Resources (Deployment, RC) once every 30 seconds by default, and compares it with the set indicator when creating the HPA, thereby implementing the function of automatic scaling.
After creating the HPA, the HPA will obtain the average utilization rate of each Pod in a certain Deployment from the Metric Server (Heapster is not used in UK8S), compare it with the indicators defined in the HPA, calculate the specific value needed for scaling and carry out operations. Its algorithm model is roughly as follows:
desiredReplicas = ceil[currentReplicas * ( currentMetricValue / desiredMetricValue )]
For example, if the current average CPU usage of all Pods is 200m, and the desired value is 100m, the number of replicas will double. If the current value is 50m, then you need to cut the number of replicas in half.
It should be noted that there is a concept of tolerance in the HPA Controller. When the ratio of currentMetricValue / desiredMetricValue is close to 1.0, it will not trigger scaling. The default variance is 0.1, which is mainly for the consideration of system stability, to avoid cluster oscillation. For example, HPA's strategy is to trigger expansion when the CPU usage is above 50%, so only when the usage rate is above 55% will it trigger the expansion action, HPA scales the Pods to try to control the usage rate of the Pods within this 45%~55% range. You can adjust the variance value through the --horizontal-pod-autoscaler-tolerance parameter.
After each scaling, there is a window time. After performing the scaling operation, no other scaling operations will be performed within this window time, which can be understood as similar to the cooldown time of skills. The default scaling-up is 3 minutes (–-horizontal-pod-autoscaler-upscale-delay), and the scaling-down is 5 minutes (–-horizontal-pod-autoscaler-downscale-delay).
Finally, it is worth noting that HPA does not work when the Pod does not set Request.
HPA Object Console Management
The addition, viewing, and deletion of HPA objects can be performed on the Elastic Scaling (HPA) subpage of the Cluster Scaling page of the UK8S cluster management console.
Click Form Add to add HPA objects through the console page, or you can add them through yaml.
Configuration Item | Description |
---|---|
Namespace | The namespace to which the HPA object belongs |
HPA Object Name | The name must start with a lowercase letter and can only contain lowercase letters, numbers, periods (.) and hyphens (-) |
Application Type | Supports Deployment and StatefulSet controllers |
Application Name | Select Deployment and StatefulSet objects that need to be flexibly scaled |
Expansion Threshold | Scale-in and scale-out thresholds, supporting the setting of CPU and memory utilization rates |
Scale Interval | Range of Pod replica numbers |
Detailed explanation of HPA API objects
The UK8S console creates HPA objects through the Kubernetes API version autoscaling/v2beta2.
Note: For version 1.26 and earlier clusters, please use
autoscaling/v2beta2
. For clusters starting with version 1.26, please useautoscaling/v2
apiVersion: autoscaling/v2beta2
kind: HorizontalPodAutoscaler
metadata:
name: nginxtest
namespace: default
spec:
maxReplicas: 5 #Maximum number of replicas
minReplicas: 1 #Minimum number of replicas
metrics:
# Set the trigger scaling CPU usage rate
- type: Resource
resource:
name: cpu
target:
averageUtilization: 50
type: Utilization
# Set the trigger scaling MEM usage rate
- type: Resource
resource:
name: memory
target:
averageUtilization: 50
type: Utilization
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment #Type of resource to be scaled
name: nginxtest #Name of the resource to be scaled
Case Study
Below we will use a simple example to see how HPA works.
1. Deploy test application
kubectl apply -f https://docs.surfercloud.com/uk8s/yaml/hpa/hpa-example.yaml
This is a compute-intensive PHP application, and the code example is as follows:
<?php
$x = 0.0001;
for ($i = 0; $i <= 1000000; $i++) {
$x += sqrt($x);
}
echo "OK!";
?>
2. Enable HPA for test application
kubectl apply -f https://docs.surfercloud.com/uk8s/yaml/hpa/hpa.yaml
3. Deploy pressure test tool
kubectl apply -f https://docs.surfercloud.com/uk8s/yaml/hpa/load.yaml
The pressure test tool is a busybox container. After the container is started, it circles and access the test application.
while true; do wget -q -O- http://hap-example.default.svc.cluster.local; done
4. Check the load situation of the test application
kubectl top pods | grep hpa-example
5. When the average CPU load of the test application exceeds 55%, we find that HPA will start to scale up Pods.
kubectl get deploy | grep hpa-example