Autoscaling in Kubernetes using HPA and VPA

Autoscaling, a key feature of Kubernetes, lets you improve the resource utilization of your cluster by automatically adjusting the application’s resources or replicas depending on the load at that time.

This blog talks about Pod Autoscaling in Kubernetes and how to set up and configure autoscalers to optimize the resource utilization of your application.

Horizontal Pod Autoscaling

What is the Horizontal Pod Autoscaler?

The Horizontal Pod Autoscaler (HPA) scales the number of pods of a replica-set/ deployment/ statefulset based on per-pod metrics received from resource metrics API (metrics.k8s.io) provided by metrics-server, the custom metrics API (custom.metrics.k8s.io), or the external metrics API (external.metrics.k8s.io).

Fig:- Horizontal Pod Autoscaling

‍Prerequisite

Verify that the metrics-server is already deployed and running using the command below, or deploy it using instructions here.

kubectl get deployment metrics-server -n kube-system

HPA using Multiple Resource Metrics‍

HPA fetches per-pod resource metrics (like CPU, memory) from the resource metrics API and calculates the current metric value based on the mean values of all targeted pods. It compares the current metric value with the target metric value specified in the HPA spec and produces a ratio used to scale the number of desired replicas.

A. Setup: Create a Deployment and HPA resource

In this blog post, I have used the config below to create a deployment of 3 replicas, with some memory load defined by “–vm-bytes”, “850M”.

apiVersion: apps/v1
kind: Deployment
metadata:
 name: autoscale-tester
spec:
 replicas: 3
 selector:
   matchLabels:
     app: autoscale-tester
 template:
   metadata:
     labels:
       app: autoscale-tester
   spec:
     containers:
     - args: [ "--vm", "1", "--vm-bytes", "850M", "--vm-hang", "1"]
       command:
       - stress
       image: polinux/stress
       name: autoscale-tester
       resources:
         limits:
           cpu: "1"
           memory: 1000Mi
         requests:
           cpu: "1"
           memory: 1000Mi

NOTE: It’s recommended not to use HPA and VPA on the same pods or deployments.

kubectl top po
NAME                            	CPU(cores)   MEMORY(bytes)   
autoscale-tester-878b8c6c8-42gmk   326m     	853Mi      	 
autoscale-tester-878b8c6c8-gp45f   410m     	852Mi      	 
autoscale-tester-878b8c6c8-tz4mg   388m     	852Mi 

Lets create an HPA resource for this deployment with multiple metric blocks defined. The HPA will consider each metric one-by-one and calculate the desired replica counts based on each of the metrics, and then select the one with the highest replica count.

apiVersion: autoscaling/v2beta2
kind: HorizontalPodAutoscaler
metadata:
 name: autoscale-tester
spec:
 scaleTargetRef:
   apiVersion: apps/v1
   kind: Deployment
   name: autoscale-tester
 minReplicas: 1
 maxReplicas: 10
 metrics:
 - type: Resource
   resource:
     name: cpu
     target:
       type: Utilization
       averageUtilization: 50
 - type: Resource
   resource:
     name: memory
     target:
       type: AverageValue
       averageValue: 500Mi

  • We have defined the minimum number  of replicas HPA can scale down to as 1 and the maximum number that it can scale up to as 10.
  • Target Average Utilization and Target Average Values implies that the HPA should scale the replicas up/down to keep the Current Metric Value equal or closest to Target Metric Value.

B. Understanding the HPA Algorithm

kubectl describe hpa autoscale-tester
Name:       autoscale-tester
Namespace:  autoscale-tester
...
Metrics:                                           	( current / target )
  resource memory on pods:                         	894188202666m / 500Mi
  resource cpu on pods  (as a percentage of request):  36% (361m) / 50%
Min replicas:                                      	1
Max replicas:                                      	10
Deployment pods:                                   	3 current / 6 desired
Conditions:
  Type        	Status  Reason          	Message
  ----        	------  ------          	-------
  AbleToScale 	True	SucceededRescale	the HPA controller was able to update the target scale to 6
  ScalingActive   True	ValidMetricFound	the HPA was able to successfully calculate a replica count from memory resource
  ScalingLimited  False   DesiredWithinRange  the desired count is within the acceptable range
Events:
  Type	Reason         	Age   From                   	Message
  ----	------         	----  ----                   	-------
  Normal  SuccessfulRescale  7s	horizontal-pod-autoscaler  New size: 6; reason: memory resource above target

  • HPA calculates pod utilization as total usage of all containers in the pod divided by total request. It looks at all containers individually and returns if container doesn’t have request.
  • The calculated  Current Metric Value for memory, i,e., 894188202666m, is higher than the Target Average Value of 500Mi, so the replicas need to be scaled up.
  • The calculated  Current Metric Value for CPU i.e., 36%, is lower than the Target Average Utilization of 50, so  hence the replicas need to be scaled down.
  • Replicas are calculated based on both metrics and the highest replica count selected. So, the replicas are scaled up to 6 in this case.

HPA using Custom metrics

We will use the prometheus-adapter resource to expose custom application metrics to custom.metrics.k8s.io/v1beta1, which are retrieved by HPA. By defining our own metrics through the adapter’s configuration, we can let HPA perform scaling based on our custom metrics.

A. Setup: Install Prometheus Adapter

Create prometheus-adapter.yaml with the content below:

prometheus:
 url: http://prometheus-server
 port: 0
image:
 tag: latest
rules:
 custom:
   - seriesQuery: 'container_network_receive_packets_total{namespace!="",pod!=""}'
     resources:
       overrides:
         namespace: {resource: "namespace"}
         pod: {resource: "pod"}
  	name:
      	matches: "container_network_receive_packets_total"
      	as: "packets_in"
  	metricsQuery: <<.Series>>{<<.LabelMatchers>>}

helm install stable/prometheus -n prometheus --namespace prometheus
helm install stable/prometheus-adapter -n prometheus-adapter --namespace prometheus -f prometheus-adapter.yaml

Once the charts are deployed, verify the metrics are exposed at v1beta1.custom.metrics.k8s.io:

kubectl get apiservice
NAME                               	SERVICE                     	AVAILABLE   AGE
v1beta1.custom.metrics.k8s.io      	prometheus/prometheus-adapter   True    	19m 


kubectl get --raw /apis/custom.metrics.k8s.io/v1beta1/namespaces/autoscale-hpa/pods/*/packets_in | jq
{
  "kind": "MetricValueList",
  "apiVersion": "custom.metrics.k8s.io/v1beta1",
  "metadata": {
	"selfLink": "/apis/custom.metrics.k8s.io/v1beta1/namespaces/autoscale-hpa/pods/%2A/packets_in"
  },
 "items": [
	{
  	"describedObject": {
    	"kind": "Pod",
    	"namespace": "autoscale-hpa",
    	"name": "autoscale-tester-878b8c6c8-42gmk",
    	"apiVersion": "/v1"
  	},
  	"metricName": "packets_in",
  	"timestamp": "2020-07-31T05:59:33Z",
  	"value": "33",
  	"selector": null
	},
	{
  	"describedObject": {
    	"kind": "Pod",
    	"namespace": "autoscale-hpa",
    	"name": "autoscale-tester-878b8c6c8-hfts8",
    	"apiVersion": "/v1"
  	},
  	"metricName": "packets_in",
  	"timestamp": "2020-07-31T05:59:33Z",
  	"value": "11",
  	"selector": null
	},
	{
  	"describedObject": {
    	"kind": "Pod",
    	"namespace": "autoscale-hpa",
    	"name": "autoscale-tester-878b8c6c8-rb9v2",
    	"apiVersion": "/v1"
  	},
  	"metricName": "packets_in",
  	"timestamp": "2020-07-31T05:59:33Z",
  	"value": "10",
  	"selector": null
	}
  ]
}

You can see the metrics value of all the replicas in the output.

B. Understanding Prometheus Adapter Configuration

The adapter considers metrics defined with the parameters below:

1. seriesQuery tells the Prometheus Metric name to the adapter

2. resources tells which Kubernetes resources each metric is associated with or which labels does the metric include, e.g., namespace, pod etc.

3. metricsQuery is the actual Prometheus query that needs to be performed to calculate the actual values.

4. name with which the metric should be exposed to the custom metrics API

For instance, if we want to calculate the rate of container_network_receive_packets_total, we will need to write this query in Prometheus UI:

sum(rate(container_network_receive_packets_total{namespace=”autoscale-tester”,pod=~”autoscale-tester.*”}[10m])) by (pod)

This query is represented as below in the adapter configuration:

metricsQuery: ‘sum(rate(<<.series>>{<<.labelmatchers>>}10m])) by (<<.groupby>>)'</.groupby></.labelmatchers></.series>

C. Create an HPA resource

Now, let’s create an HPA resource with the pod metric packets_in using the config below, and then describe the HPA resource.

apiVersion: autoscaling/v2beta2
kind: HorizontalPodAutoscaler
metadata:
 name: autoscale-tester
spec:
 scaleTargetRef:
   apiVersion: apps/v1
   kind: Deployment
   name: autoscale-tester
 minReplicas: 1
 maxReplicas: 10
 metrics:
 - type: Pods
   pods:
     metric:
       name: packets_in
     target:
       type: AverageValue
       averageValue: 50

kubectl describe hpa autoscale-tester
Name:                	autoscale-tester
Namespace:           	autoscale-tester
...
Metrics:             	( current / target )
  "packets_in" on pods:  18666m / 50
Min replicas:        	1
Max replicas:        	10
Deployment pods:     	3 current / 3 desired
Conditions:
  Type        	Status  Reason          	Message
  ----        	------  ------          	-------
  AbleToScale 	True	SucceededRescale	the HPA controller was able to update the target scale to 2
  ScalingActive   True	ValidMetricFound	the HPA was able to successfully calculate a replica count from pods metric packets_in
  ScalingLimited  False   DesiredWithinRange  the desired count is within the acceptable range
Events:
  Type	Reason         	Age   From                   	Message
  ----	------         	----  ----                   	-------
  Normal  SuccessfulRescale  2s	horizontal-pod-autoscaler  New size: 2; reason: All metrics below target
  Normal  SuccessfulRescale  2m51s  horizontal-pod-autoscaler  New size: 1; reason: All metrics below target 
kubectl describe hpa autoscale-tester
Name:                	autoscale-tester
Namespace:           	autoscale-tester
...
Metrics:             	( current / target )
  "packets_in" on pods:  18666m / 50
Min replicas:        	1
Max replicas:        	10
Deployment pods:     	3 current / 3 desired
Conditions:
  Type        	Status  Reason          	Message
  ----        	------  ------          	-------
  AbleToScale 	True	SucceededRescale	the HPA controller was able to update the target scale to 2
  ScalingActive   True	ValidMetricFound	the HPA was able to successfully calculate a replica count from pods metric packets_in
  ScalingLimited  False   DesiredWithinRange  the desired count is within the acceptable range
Events:
  Type	Reason         	Age   From                   	Message
  ----	------         	----  ----                   	-------
  Normal  SuccessfulRescale  2s	horizontal-pod-autoscaler  New size: 2; reason: All metrics below target
  Normal  SuccessfulRescale  2m51s  horizontal-pod-autoscaler  New size: 1; reason: All metrics below target 

Here, the current calculated metric value is 18666m. The m represents milli-units. So, for example, 18666m means 18.666 which is what we expect ((33 + 11 + 10 )/3 = 18.666). Since it’s less than the target average value (i.e., 50), the HPA scales down the replicas to make the Current Metric Value : Target Metric Value ratio closest to 1. Hence, replicas are scaled down to 2 and later to 1.

Fig:- container_network_receive_packets_total

 

Fig:- Ratio to Target value

‍Vertical Pod Autoscaling

What is Vertical Pod Autoscaler?

Vertical Pod autoscaling (VPA) ensures that a container’s resources are not under- or over-utilized. It recommends optimized CPU and memory requests/limits values, and can also automatically update them for you so that the cluster resources are efficiently used.

Fig:- Vertical Pod Autoscaling

Architecture

VPA consists of 3 components:

  • VPA admission controller
    Once you deploy and enable the Vertical Pod Autoscaler in your cluster, every pod submitted to the cluster goes through this webhook, which checks whether a VPA object is referencing it.
  • VPA recommender
    The recommender pulls the current and past resource consumption (CPU and memory) data for each container from metrics-server running in the cluster and provides optimal resource recommendations based on it, so that a container uses only what it needs.
  • VPA updater
    The updater checks at regular intervals if a pod is running within the recommended range. Otherwise, it accepts it for update, and the pod is evicted by the VPA updater to apply resource recommendation.

Installation

If you are on Google Cloud Platform, you can simply enable vertical-pod-autoscaling:

gcloud container clusters update <cluster-name> --enable-vertical-pod-autoscaling

To install it manually follow below steps:

  • Verify that the metrics-server deployment is running, or deploy it using instructions here.
kubectl get deployment metrics-server -n kube-system

  • Also, verify the API below is enabled:
kubectl api-versions | grep admissionregistration
admissionregistration.k8s.io/v1beta1

  • Clone the kubernetes/autoscaler GitHub repository, and then deploy the Vertical Pod Autoscaler with the following command.
git clone https://github.com/kubernetes/autoscaler.git
./autoscaler/vertical-pod-autoscaler/hack/vpa-up.sh

Verify that the Vertical Pod Autoscaler pods are up and running:

kubectl get po -n kube-system
NAME                                        READY   STATUS    RESTARTS   AGE
vpa-admission-controller-68c748777d-ppspd   1/1     Running   0          7s
vpa-recommender-6fc8c67d85-gljpl            1/1     Running   0          8s
vpa-updater-786b96955c-bgp9d                1/1     Running   0          8s

kubectl get crd
verticalpodautoscalers.autoscaling.k8s.io 

VPA using Resource Metrics

A. Setup: Create a Deployment and VPA resource

Use the same deployment config to create a new deployment with “–vm-bytes”, “850M”. Then create a VPA resource in Recommendation Mode with updateMode : Off

apiVersion: autoscaling.k8s.io/v1beta2
kind: VerticalPodAutoscaler
metadata:
 name: autoscale-tester-recommender
spec:
 targetRef:
   apiVersion: "apps/v1"
   kind:       Deployment
   name:       autoscale-tester
 updatePolicy:
   updateMode: "Off"
 resourcePolicy:
   containerPolicies:
   - containerName: autoscale-tester
     minAllowed:
       cpu: "500m"
       memory: "500Mi"
     maxAllowed:
       cpu: "4"
       memory: "8Gi"

  • minAllowed is an optional parameter that specifies the minimum CPU request and memory request allowed for the container. 
  • maxAllowed is an optional parameter that specifies the maximum CPU request and memory request allowed for the container.

B. Check the Pod’s Resource Utilization

Check the resource utilization of the pods. Below, you can see only ~50 Mi memory is being used out of 1000Mi and only ~30m CPU out of 1000m. This clearly indicates that the pod resources are underutilized.

Kubectl top po
NAME                            	CPU(cores)   MEMORY(bytes)   
autoscale-tester-5d6b48d64f-8zgb9   39m      	51Mi       	 
autoscale-tester-5d6b48d64f-npts4   32m      	50Mi       	 
autoscale-tester-5d6b48d64f-vctx5   35m      	50Mi 

If you describe the VPA resource, you can see the Recommendations provided. (It may take some time to show them.)

kubectl describe vpa autoscale-tester-recommender
Name:     	autoscale-tester-recommender
Namespace:	autoscale-tester
...
  Recommendation:
	Container Recommendations:
  	Container Name:  autoscale-tester
  	Lower Bound:
    	Cpu: 	500m
    	Memory:  500Mi
  	Target:
    	Cpu: 	500m
    	Memory:  500Mi
  	Uncapped Target:
    	Cpu: 	93m
    	Memory:  262144k
  	Upper Bound:
    	Cpu: 	4
    	Memory:  4Gi

C. Understand the VPA recommendations

Target: The recommended CPU request and memory request for the container that will be applied to the pod by VPA.

Uncapped Target: The recommended CPU request and memory request for the container if you didn’t configure upper/lower limits in the VPA definition. These values will not be applied to the pod. They’re used only as a status indication.

Lower Bound: The minimum recommended CPU request and memory request for the container. There is a –pod-recommendation-min-memory-mb flag that determines the minimum amount of memory the recommender will set—it defaults to 250MiB.

Upper Bound: The maximum recommended CPU request and memory request for the container.  It helps the VPA updater avoid eviction of pods that are close to the recommended target values. Eventually, the Upper Bound is expected to reach close to target recommendation.

 Recommendation:
	Container Recommendations:
  	Container Name:  autoscale-tester
  	Lower Bound:
    	Cpu: 	500m
    	Memory:  500Mi
  	Target:
    	Cpu: 	500m
    	Memory:  500Mi
  	Uncapped Target:
    	Cpu: 	93m
    	Memory:  262144k
  	Upper Bound:
    	Cpu: 	500m
    	Memory:  1274858485 

D. VPA processing with Update Mode Off/Auto

Now, if you check the logs of vpa-updater, you can see it’s not processing VPA objects as the Update Mode is set as Off.

kubectl logs -f vpa-updater-675d47464b-k7xbx
1 updater.go:135] skipping VPA object autoscale-tester-recommender because its mode is not "Recreate" or "Auto"
1 updater.go:151] no VPA objects to process

VPA allows various Update Modes, detailed here.

Let’s change the VPA updateMode to “Auto” to see the processing.

As soon as you do that, you can see vpa-updater has started processing objects, and it’s terminating all 3 pods.

kubectl logs -f vpa-updater-675d47464b-k7xbx
1 update_priority_calculator.go:147] pod accepted for update autoscale-tester/autoscale-tester-5d6b48d64f-8zgb9 with priority 1
1 update_priority_calculator.go:147] pod accepted for update autoscale-tester/autoscale-tester-5d6b48d64f-npts4 with priority 1
1 update_priority_calculator.go:147] pod accepted for update autoscale-tester/autoscale-tester-5d6b48d64f-vctx5 with priority 1
1 updater.go:193] evicting pod autoscale-tester-5d6b48d64f-8zgb9
1 event.go:281] Event(v1.ObjectReference{Kind:"Pod", Namespace:"autoscale-tester", Name:"autoscale-tester-5d6b48d64f-8zgb9", UID:"ed8c54c7-a87a-4c39-a000-0e74245f18c6", APIVersion:"v1", ResourceVersion:"378376", FieldPath:""}): 
type: 'Normal' reason: 'EvictedByVPA' Pod was evicted by VPA Updater to apply resource recommendation.

You can also check the logs of vpa-admission-controller:

kubectl logs -f vpa-admission-controller-bbf4f4cc7-cb6pb
Sending patches: [{add /metadata/annotations map[]} {add /spec/containers/0/resources/requests/cpu 500m} {add /spec/containers/0/resources/requests/memory 500Mi} {add /spec/containers/0/resources/limits/cpu 500m} {add /spec/containers/0/resources/limits/memory 500Mi} {add /metadata/annotations/vpaUpdates Pod resources updated by autoscale-tester-recommender: container 0: cpu request, memory request, cpu limit, memory limit} {add /metadata/annotations/vpaObservedContainers autoscale-tester}]

NOTE: Ensure that you have more than 1 running replicas. Otherwise, the pods won’t be restarted, and vpa-updater will give you this warning:

1 pods_eviction_restriction.go:209] too few replicas for ReplicaSet autoscale-tester/autoscale-tester1-7698974f6. Found 1 live pods

Now, describe the new pods created and check that the resources match the Target recommendations:

kubectl get po
NAME                            	READY   STATUS    	RESTARTS   AGE
autoscale-tester-5d6b48d64f-5dlb7   1/1 	Running   	0      	77s
autoscale-tester-5d6b48d64f-9wq4w   1/1 	Running   	0      	37s
autoscale-tester-5d6b48d64f-qrlxn   1/1 	Running   	0      	17s


kubectl describe po autoscale-tester-5d6b48d64f-5dlb7
Name:     	autoscale-tester-5d6b48d64f-5dlb7
Namespace:	autoscale-tester
...
	Limits:
  	cpu: 	500m
  	memory:  500Mi
	Requests:
  	cpu:    	500m
  	memory: 	500Mi
	Environment:  <none>

The Target Recommendation can not go below the minAllowed defined in the VPA spec.

Fig:- Prometheus: Memory Usage Ratio

E. Stress Loading Pods

Let’s recreate the deployment with memory request and limit set to 2000Mi and “–vm-bytes”, “500M”.

Gradually stress load one of these pods to increase its memory utilization.
You can login to the pod and run stress –vm 1 –vm-bytes 1400M –timeout 120000s.


kubectl top po
NAME                            	CPU(cores)   MEMORY(bytes)   
autoscale-tester-5d6b48d64f-5dlb7   1000m     	1836Mi       	 
autoscale-tester-5d6b48d64f-9wq4w   252m      	501Mi       	 
autoscale-tester-5d6b48d64f-qrlxn   252m      	501Mi 	

Fig:- Prometheus memory utilized by each Replica

You will notice that the VPA recommendation is also calculated accordingly and applied to all replicas.

kubectl describe vpa autoscale-tester-recommender
Name:     	autoscale-tester-recommender
Namespace:	autoscale-tester
...
  Recommendation:
	Container Recommendations:
  	Container Name:  autoscale-tester
  	Lower Bound:
    	Cpu: 	500m
    	Memory:  500Mi
  	Target:
    	Cpu: 	500m
    	Memory:  628694953
  	Uncapped Target:
    	Cpu: 	49m
    	Memory:  628694953
  	Upper Bound:
    	Cpu: 	500m
    	Memory:  1553712527

Limits v/s Request
VPA always works with the requests defined for a container and not the limits. So, the VPA recommendations are also applied to the container requests, and it maintains a limit to request ratio specified for all containers.

For example, if the initial container configuration defines a 100m Memory Request and 300m Memory Limit, then when the VPA target recommendation is 150m Memory, the container Memory Request will be updated to 150m and Memory Limit to 450m.

Selective Container Scaling

If you have a pod with multiple containers and you want to opt-out some of them, you can use the “Off” mode to turn off recommendations for a container.

You can also set containerName: “*” to include all containers.

spec:
 targetRef:
   apiVersion: "apps/v1"
   kind:       Deployment
   name:       autoscale-tester
 updatePolicy:
   updateMode: "Auto"
 resourcePolicy:
   containerPolicies:
   - containerName: autoscale-tester
     minAllowed:
       cpu: "500m"
       memory: "500Mi"
     maxAllowed:
       cpu: "4"
       memory: "4Gi"
   - containerName: opt-out-container
     mode: "Off"

Conclusion

Both the Horizontal Pod Autoscaler and the Vertical Pod Autoscaler serve different purposes and one can be more useful than the other depending on your application’s requirement.

The HPA can be useful when, for example, your application is serving a large number of lightweight (low resource-consuming) requests. In that case, scaling number of replicas can distribute the workload on each of the pod. The VPA, on the other hand, can be useful when your application serves heavyweight requests, which requires higher resources.

Related Articles:

1. A Practical Guide to Deploying Multi-tier Applications on Google Container Engine (GKE)

2. Know Everything About Spinnaker & How to Deploy Using Kubernetes Engine

Comments

Leave a Reply

Your email address will not be published. Required fields are marked *