Horizontal Pod Autoscaler (HPA)

Introduction

HPA (Horizontal Pod Autoscaling) refers to the horizontal automatic scaling of Kubernetes Pods, which is also an API object in Kubernetes. With this scaling component, Kubernetes clusters can use monitoring indicators (such as CPU usage) to automatically scale up or down the number of Pods in services. When business needs increase, HPA will automatically increase the number of Pods in the service, improving system stability. When business needs decline, HPA will automatically reduce the number of Pods in the service, reducing the request volume (Request) for cluster resources. Combined with Cluster Autoscaler, it can also achieve automatic scaling of the cluster scale, saving IT costs.

It should be noted that the default HPA only supports detection of scaling based on the threshold of CPU and memory, but it can also call prometheus via custom metric api to implement custom metrics, and implement elastic scaling based on more flexible monitoring indicators. However, HPA cannot be used to scale controllers that cannot be scaled, such as DaemonSet.

Working Principle

HPA is designed as a Controller in K8S, and it can be simply created using the kubectl autoscale command. The HPA Controller queries the usage rate of specified Resources (Deployment, RC) once every 30 seconds by default, and compares it with the set indicator when creating the HPA, thereby implementing the function of automatic scaling.

After creating the HPA, the HPA will obtain the average utilization rate of each Pod in a certain Deployment from the Metric Server (Heapster is not used in UK8S), compare it with the indicators defined in the HPA, calculate the specific value needed for scaling and carry out operations. Its algorithm model is roughly as follows:


desiredReplicas = ceil[currentReplicas * ( currentMetricValue / desiredMetricValue )]

For example, if the current average CPU usage of all Pods is 200m, and the desired value is 100m, the number of replicas will double. If the current value is 50m, then you need to cut the number of replicas in half.

It should be noted that there is a concept of tolerance in the HPA Controller. When the ratio of currentMetricValue / desiredMetricValue is close to 1.0, it will not trigger scaling. The default variance is 0.1, which is mainly for the consideration of system stability, to avoid cluster oscillation. For example, HPA’s strategy is to trigger expansion when the CPU usage is above 50%, so only when the usage rate is above 55% will it trigger the expansion action, HPA scales the Pods to try to control the usage rate of the Pods within this 45%~55% range. You can adjust the variance value through the —horizontal-pod-autoscaler-tolerance parameter.

After each scaling, there is a window time. After performing the scaling operation, no other scaling operations will be performed within this window time, which can be understood as similar to the cooldown time of skills. The default scaling-up is 3 minutes (–-horizontal-pod-autoscaler-upscale-delay), and the scaling-down is 5 minutes (–-horizontal-pod-autoscaler-downscale-delay).

Finally, it is worth noting that HPA does not work when the Pod does not set Request.

HPA Object Console Management

The addition, viewing, and deletion of HPA objects can be performed on the Elastic Scaling (HPA) subpage of the Cluster Scaling page of the UK8S cluster management console.

Click Form Add to add HPA objects through the console page, or you can add them through yaml.

Configuration Item	Description
Namespace	The namespace to which the HPA object belongs
HPA Object Name	The name must start with a lowercase letter and can only contain lowercase letters, numbers, periods (.) and hyphens (-)
Application Type	Supports Deployment and StatefulSet controllers
Application Name	Select Deployment and StatefulSet objects that need to be flexibly scaled
Expansion Threshold	Scale-in and scale-out thresholds, supporting the setting of CPU and memory utilization rates
Scale Interval	Range of Pod replica numbers

Detailed explanation of HPA API objects

The UK8S console creates HPA objects through the Kubernetes API version autoscaling/v2beta2.

Note: For version 1.26 and earlier clusters, please use autoscaling/v2beta2. For clusters starting with version 1.26, please use autoscaling/v2


apiVersion: autoscaling/v2beta2
kind: HorizontalPodAutoscaler
metadata:
  name: nginxtest
  namespace: default
spec:
  maxReplicas: 5 #Maximum number of replicas
  minReplicas: 1 #Minimum number of replicas
  metrics:
    # Set the trigger scaling CPU usage rate
    - type: Resource
      resource:
        name: cpu
        target:
          averageUtilization: 50
          type: Utilization
    # Set the trigger scaling MEM usage rate
    - type: Resource
      resource:
        name: memory
        target:
          averageUtilization: 50
          type: Utilization     
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment #Type of resource to be scaled
    name: nginxtest  #Name of the resource to be scaled

Case Study

Below we will use a simple example to see how HPA works.

1. Deploy test application


kubectl apply -f  https://docs.surfercloud.com/uk8s/yaml/hpa/hpa-example.yaml

This is a compute-intensive PHP application, and the code example is as follows:


<?php
  $x = 0.0001;
  for ($i = 0; $i <= 1000000; $i++) {
    $x += sqrt($x);
  }
  echo "OK!";
?>

2. Enable HPA for test application


kubectl apply -f  https://docs.surfercloud.com/uk8s/yaml/hpa/hpa.yaml

3. Deploy pressure test tool


kubectl apply -f https://docs.surfercloud.com/uk8s/yaml/hpa/load.yaml

The pressure test tool is a busybox container. After the container is started, it circles and access the test application.


while true; do wget -q -O- http://hap-example.default.svc.cluster.local; done

4. Check the load situation of the test application


kubectl top pods | grep hpa-example

5. When the average CPU load of the test application exceeds 55%, we find that HPA will start to scale up Pods.


kubectl get deploy | grep hpa-example