Tag: hpa

  • Autoscaling in Kubernetes using HPA and VPA

    Autoscaling, a key feature of Kubernetes, lets you improve the resource utilization of your cluster by automatically adjusting the application’s resources or replicas depending on the load at that time.

    This blog talks about Pod Autoscaling in Kubernetes and how to set up and configure autoscalers to optimize the resource utilization of your application.

    Horizontal Pod Autoscaling

    What is the Horizontal Pod Autoscaler?

    The Horizontal Pod Autoscaler (HPA) scales the number of pods of a replica-set/ deployment/ statefulset based on per-pod metrics received from resource metrics API (metrics.k8s.io) provided by metrics-server, the custom metrics API (custom.metrics.k8s.io), or the external metrics API (external.metrics.k8s.io).

    Fig:- Horizontal Pod Autoscaling

    ‍Prerequisite

    Verify that the metrics-server is already deployed and running using the command below, or deploy it using instructions here.

    kubectl get deployment metrics-server -n kube-system

    HPA using Multiple Resource Metrics‍

    HPA fetches per-pod resource metrics (like CPU, memory) from the resource metrics API and calculates the current metric value based on the mean values of all targeted pods. It compares the current metric value with the target metric value specified in the HPA spec and produces a ratio used to scale the number of desired replicas.

    A. Setup: Create a Deployment and HPA resource

    In this blog post, I have used the config below to create a deployment of 3 replicas, with some memory load defined by “–vm-bytes”, “850M”.

    apiVersion: apps/v1
    kind: Deployment
    metadata:
     name: autoscale-tester
    spec:
     replicas: 3
     selector:
       matchLabels:
         app: autoscale-tester
     template:
       metadata:
         labels:
           app: autoscale-tester
       spec:
         containers:
         - args: [ "--vm", "1", "--vm-bytes", "850M", "--vm-hang", "1"]
           command:
           - stress
           image: polinux/stress
           name: autoscale-tester
           resources:
             limits:
               cpu: "1"
               memory: 1000Mi
             requests:
               cpu: "1"
               memory: 1000Mi

    NOTE: It’s recommended not to use HPA and VPA on the same pods or deployments.

    kubectl top po
    NAME                            	CPU(cores)   MEMORY(bytes)   
    autoscale-tester-878b8c6c8-42gmk   326m     	853Mi      	 
    autoscale-tester-878b8c6c8-gp45f   410m     	852Mi      	 
    autoscale-tester-878b8c6c8-tz4mg   388m     	852Mi 

    Lets create an HPA resource for this deployment with multiple metric blocks defined. The HPA will consider each metric one-by-one and calculate the desired replica counts based on each of the metrics, and then select the one with the highest replica count.

    apiVersion: autoscaling/v2beta2
    kind: HorizontalPodAutoscaler
    metadata:
     name: autoscale-tester
    spec:
     scaleTargetRef:
       apiVersion: apps/v1
       kind: Deployment
       name: autoscale-tester
     minReplicas: 1
     maxReplicas: 10
     metrics:
     - type: Resource
       resource:
         name: cpu
         target:
           type: Utilization
           averageUtilization: 50
     - type: Resource
       resource:
         name: memory
         target:
           type: AverageValue
           averageValue: 500Mi

    • We have defined the minimum number  of replicas HPA can scale down to as 1 and the maximum number that it can scale up to as 10.
    • Target Average Utilization and Target Average Values implies that the HPA should scale the replicas up/down to keep the Current Metric Value equal or closest to Target Metric Value.

    B. Understanding the HPA Algorithm

    kubectl describe hpa autoscale-tester
    Name:       autoscale-tester
    Namespace:  autoscale-tester
    ...
    Metrics:                                           	( current / target )
      resource memory on pods:                         	894188202666m / 500Mi
      resource cpu on pods  (as a percentage of request):  36% (361m) / 50%
    Min replicas:                                      	1
    Max replicas:                                      	10
    Deployment pods:                                   	3 current / 6 desired
    Conditions:
      Type        	Status  Reason          	Message
      ----        	------  ------          	-------
      AbleToScale 	True	SucceededRescale	the HPA controller was able to update the target scale to 6
      ScalingActive   True	ValidMetricFound	the HPA was able to successfully calculate a replica count from memory resource
      ScalingLimited  False   DesiredWithinRange  the desired count is within the acceptable range
    Events:
      Type	Reason         	Age   From                   	Message
      ----	------         	----  ----                   	-------
      Normal  SuccessfulRescale  7s	horizontal-pod-autoscaler  New size: 6; reason: memory resource above target

    • HPA calculates pod utilization as total usage of all containers in the pod divided by total request. It looks at all containers individually and returns if container doesn’t have request.
    • The calculated  Current Metric Value for memory, i,e., 894188202666m, is higher than the Target Average Value of 500Mi, so the replicas need to be scaled up.
    • The calculated  Current Metric Value for CPU i.e., 36%, is lower than the Target Average Utilization of 50, so  hence the replicas need to be scaled down.
    • Replicas are calculated based on both metrics and the highest replica count selected. So, the replicas are scaled up to 6 in this case.

    HPA using Custom metrics

    We will use the prometheus-adapter resource to expose custom application metrics to custom.metrics.k8s.io/v1beta1, which are retrieved by HPA. By defining our own metrics through the adapter’s configuration, we can let HPA perform scaling based on our custom metrics.

    A. Setup: Install Prometheus Adapter

    Create prometheus-adapter.yaml with the content below:

    prometheus:
     url: http://prometheus-server
     port: 0
    image:
     tag: latest
    rules:
     custom:
       - seriesQuery: 'container_network_receive_packets_total{namespace!="",pod!=""}'
         resources:
           overrides:
             namespace: {resource: "namespace"}
             pod: {resource: "pod"}
      	name:
          	matches: "container_network_receive_packets_total"
          	as: "packets_in"
      	metricsQuery: <<.Series>>{<<.LabelMatchers>>}

    helm install stable/prometheus -n prometheus --namespace prometheus
    helm install stable/prometheus-adapter -n prometheus-adapter --namespace prometheus -f prometheus-adapter.yaml

    Once the charts are deployed, verify the metrics are exposed at v1beta1.custom.metrics.k8s.io:

    kubectl get apiservice
    NAME                               	SERVICE                     	AVAILABLE   AGE
    v1beta1.custom.metrics.k8s.io      	prometheus/prometheus-adapter   True    	19m 
    
    
    kubectl get --raw /apis/custom.metrics.k8s.io/v1beta1/namespaces/autoscale-hpa/pods/*/packets_in | jq
    {
      "kind": "MetricValueList",
      "apiVersion": "custom.metrics.k8s.io/v1beta1",
      "metadata": {
    	"selfLink": "/apis/custom.metrics.k8s.io/v1beta1/namespaces/autoscale-hpa/pods/%2A/packets_in"
      },
     "items": [
    	{
      	"describedObject": {
        	"kind": "Pod",
        	"namespace": "autoscale-hpa",
        	"name": "autoscale-tester-878b8c6c8-42gmk",
        	"apiVersion": "/v1"
      	},
      	"metricName": "packets_in",
      	"timestamp": "2020-07-31T05:59:33Z",
      	"value": "33",
      	"selector": null
    	},
    	{
      	"describedObject": {
        	"kind": "Pod",
        	"namespace": "autoscale-hpa",
        	"name": "autoscale-tester-878b8c6c8-hfts8",
        	"apiVersion": "/v1"
      	},
      	"metricName": "packets_in",
      	"timestamp": "2020-07-31T05:59:33Z",
      	"value": "11",
      	"selector": null
    	},
    	{
      	"describedObject": {
        	"kind": "Pod",
        	"namespace": "autoscale-hpa",
        	"name": "autoscale-tester-878b8c6c8-rb9v2",
        	"apiVersion": "/v1"
      	},
      	"metricName": "packets_in",
      	"timestamp": "2020-07-31T05:59:33Z",
      	"value": "10",
      	"selector": null
    	}
      ]
    }

    You can see the metrics value of all the replicas in the output.

    B. Understanding Prometheus Adapter Configuration

    The adapter considers metrics defined with the parameters below:

    1. seriesQuery tells the Prometheus Metric name to the adapter

    2. resources tells which Kubernetes resources each metric is associated with or which labels does the metric include, e.g., namespace, pod etc.

    3. metricsQuery is the actual Prometheus query that needs to be performed to calculate the actual values.

    4. name with which the metric should be exposed to the custom metrics API

    For instance, if we want to calculate the rate of container_network_receive_packets_total, we will need to write this query in Prometheus UI:

    sum(rate(container_network_receive_packets_total{namespace=”autoscale-tester”,pod=~”autoscale-tester.*”}[10m])) by (pod)

    This query is represented as below in the adapter configuration:

    metricsQuery: ‘sum(rate(<<.series>>{<<.labelmatchers>>}10m])) by (<<.groupby>>)'</.groupby></.labelmatchers></.series>

    C. Create an HPA resource

    Now, let’s create an HPA resource with the pod metric packets_in using the config below, and then describe the HPA resource.

    apiVersion: autoscaling/v2beta2
    kind: HorizontalPodAutoscaler
    metadata:
     name: autoscale-tester
    spec:
     scaleTargetRef:
       apiVersion: apps/v1
       kind: Deployment
       name: autoscale-tester
     minReplicas: 1
     maxReplicas: 10
     metrics:
     - type: Pods
       pods:
         metric:
           name: packets_in
         target:
           type: AverageValue
           averageValue: 50

    kubectl describe hpa autoscale-tester
    Name:                	autoscale-tester
    Namespace:           	autoscale-tester
    ...
    Metrics:             	( current / target )
      "packets_in" on pods:  18666m / 50
    Min replicas:        	1
    Max replicas:        	10
    Deployment pods:     	3 current / 3 desired
    Conditions:
      Type        	Status  Reason          	Message
      ----        	------  ------          	-------
      AbleToScale 	True	SucceededRescale	the HPA controller was able to update the target scale to 2
      ScalingActive   True	ValidMetricFound	the HPA was able to successfully calculate a replica count from pods metric packets_in
      ScalingLimited  False   DesiredWithinRange  the desired count is within the acceptable range
    Events:
      Type	Reason         	Age   From                   	Message
      ----	------         	----  ----                   	-------
      Normal  SuccessfulRescale  2s	horizontal-pod-autoscaler  New size: 2; reason: All metrics below target
      Normal  SuccessfulRescale  2m51s  horizontal-pod-autoscaler  New size: 1; reason: All metrics below target 
    kubectl describe hpa autoscale-tester
    Name:                	autoscale-tester
    Namespace:           	autoscale-tester
    ...
    Metrics:             	( current / target )
      "packets_in" on pods:  18666m / 50
    Min replicas:        	1
    Max replicas:        	10
    Deployment pods:     	3 current / 3 desired
    Conditions:
      Type        	Status  Reason          	Message
      ----        	------  ------          	-------
      AbleToScale 	True	SucceededRescale	the HPA controller was able to update the target scale to 2
      ScalingActive   True	ValidMetricFound	the HPA was able to successfully calculate a replica count from pods metric packets_in
      ScalingLimited  False   DesiredWithinRange  the desired count is within the acceptable range
    Events:
      Type	Reason         	Age   From                   	Message
      ----	------         	----  ----                   	-------
      Normal  SuccessfulRescale  2s	horizontal-pod-autoscaler  New size: 2; reason: All metrics below target
      Normal  SuccessfulRescale  2m51s  horizontal-pod-autoscaler  New size: 1; reason: All metrics below target 

    Here, the current calculated metric value is 18666m. The m represents milli-units. So, for example, 18666m means 18.666 which is what we expect ((33 + 11 + 10 )/3 = 18.666). Since it’s less than the target average value (i.e., 50), the HPA scales down the replicas to make the Current Metric Value : Target Metric Value ratio closest to 1. Hence, replicas are scaled down to 2 and later to 1.

    Fig:- container_network_receive_packets_total

     

    Fig:- Ratio to Target value

    ‍Vertical Pod Autoscaling

    What is Vertical Pod Autoscaler?

    Vertical Pod autoscaling (VPA) ensures that a container’s resources are not under- or over-utilized. It recommends optimized CPU and memory requests/limits values, and can also automatically update them for you so that the cluster resources are efficiently used.

    Fig:- Vertical Pod Autoscaling

    Architecture

    VPA consists of 3 components:

    • VPA admission controller
      Once you deploy and enable the Vertical Pod Autoscaler in your cluster, every pod submitted to the cluster goes through this webhook, which checks whether a VPA object is referencing it.
    • VPA recommender
      The recommender pulls the current and past resource consumption (CPU and memory) data for each container from metrics-server running in the cluster and provides optimal resource recommendations based on it, so that a container uses only what it needs.
    • VPA updater
      The updater checks at regular intervals if a pod is running within the recommended range. Otherwise, it accepts it for update, and the pod is evicted by the VPA updater to apply resource recommendation.

    Installation

    If you are on Google Cloud Platform, you can simply enable vertical-pod-autoscaling:

    gcloud container clusters update <cluster-name> --enable-vertical-pod-autoscaling

    To install it manually follow below steps:

    • Verify that the metrics-server deployment is running, or deploy it using instructions here.
    kubectl get deployment metrics-server -n kube-system

    • Also, verify the API below is enabled:
    kubectl api-versions | grep admissionregistration
    admissionregistration.k8s.io/v1beta1

    • Clone the kubernetes/autoscaler GitHub repository, and then deploy the Vertical Pod Autoscaler with the following command.
    git clone https://github.com/kubernetes/autoscaler.git
    ./autoscaler/vertical-pod-autoscaler/hack/vpa-up.sh

    Verify that the Vertical Pod Autoscaler pods are up and running:

    kubectl get po -n kube-system
    NAME                                        READY   STATUS    RESTARTS   AGE
    vpa-admission-controller-68c748777d-ppspd   1/1     Running   0          7s
    vpa-recommender-6fc8c67d85-gljpl            1/1     Running   0          8s
    vpa-updater-786b96955c-bgp9d                1/1     Running   0          8s
    
    kubectl get crd
    verticalpodautoscalers.autoscaling.k8s.io 

    VPA using Resource Metrics

    A. Setup: Create a Deployment and VPA resource

    Use the same deployment config to create a new deployment with “–vm-bytes”, “850M”. Then create a VPA resource in Recommendation Mode with updateMode : Off

    apiVersion: autoscaling.k8s.io/v1beta2
    kind: VerticalPodAutoscaler
    metadata:
     name: autoscale-tester-recommender
    spec:
     targetRef:
       apiVersion: "apps/v1"
       kind:       Deployment
       name:       autoscale-tester
     updatePolicy:
       updateMode: "Off"
     resourcePolicy:
       containerPolicies:
       - containerName: autoscale-tester
         minAllowed:
           cpu: "500m"
           memory: "500Mi"
         maxAllowed:
           cpu: "4"
           memory: "8Gi"

    • minAllowed is an optional parameter that specifies the minimum CPU request and memory request allowed for the container. 
    • maxAllowed is an optional parameter that specifies the maximum CPU request and memory request allowed for the container.

    B. Check the Pod’s Resource Utilization

    Check the resource utilization of the pods. Below, you can see only ~50 Mi memory is being used out of 1000Mi and only ~30m CPU out of 1000m. This clearly indicates that the pod resources are underutilized.

    Kubectl top po
    NAME                            	CPU(cores)   MEMORY(bytes)   
    autoscale-tester-5d6b48d64f-8zgb9   39m      	51Mi       	 
    autoscale-tester-5d6b48d64f-npts4   32m      	50Mi       	 
    autoscale-tester-5d6b48d64f-vctx5   35m      	50Mi 

    If you describe the VPA resource, you can see the Recommendations provided. (It may take some time to show them.)

    kubectl describe vpa autoscale-tester-recommender
    Name:     	autoscale-tester-recommender
    Namespace:	autoscale-tester
    ...
      Recommendation:
    	Container Recommendations:
      	Container Name:  autoscale-tester
      	Lower Bound:
        	Cpu: 	500m
        	Memory:  500Mi
      	Target:
        	Cpu: 	500m
        	Memory:  500Mi
      	Uncapped Target:
        	Cpu: 	93m
        	Memory:  262144k
      	Upper Bound:
        	Cpu: 	4
        	Memory:  4Gi

    C. Understand the VPA recommendations

    Target: The recommended CPU request and memory request for the container that will be applied to the pod by VPA.

    Uncapped Target: The recommended CPU request and memory request for the container if you didn’t configure upper/lower limits in the VPA definition. These values will not be applied to the pod. They’re used only as a status indication.

    Lower Bound: The minimum recommended CPU request and memory request for the container. There is a –pod-recommendation-min-memory-mb flag that determines the minimum amount of memory the recommender will set—it defaults to 250MiB.

    Upper Bound: The maximum recommended CPU request and memory request for the container.  It helps the VPA updater avoid eviction of pods that are close to the recommended target values. Eventually, the Upper Bound is expected to reach close to target recommendation.

     Recommendation:
    	Container Recommendations:
      	Container Name:  autoscale-tester
      	Lower Bound:
        	Cpu: 	500m
        	Memory:  500Mi
      	Target:
        	Cpu: 	500m
        	Memory:  500Mi
      	Uncapped Target:
        	Cpu: 	93m
        	Memory:  262144k
      	Upper Bound:
        	Cpu: 	500m
        	Memory:  1274858485 

    D. VPA processing with Update Mode Off/Auto

    Now, if you check the logs of vpa-updater, you can see it’s not processing VPA objects as the Update Mode is set as Off.

    kubectl logs -f vpa-updater-675d47464b-k7xbx
    1 updater.go:135] skipping VPA object autoscale-tester-recommender because its mode is not "Recreate" or "Auto"
    1 updater.go:151] no VPA objects to process

    VPA allows various Update Modes, detailed here.

    Let’s change the VPA updateMode to “Auto” to see the processing.

    As soon as you do that, you can see vpa-updater has started processing objects, and it’s terminating all 3 pods.

    kubectl logs -f vpa-updater-675d47464b-k7xbx
    1 update_priority_calculator.go:147] pod accepted for update autoscale-tester/autoscale-tester-5d6b48d64f-8zgb9 with priority 1
    1 update_priority_calculator.go:147] pod accepted for update autoscale-tester/autoscale-tester-5d6b48d64f-npts4 with priority 1
    1 update_priority_calculator.go:147] pod accepted for update autoscale-tester/autoscale-tester-5d6b48d64f-vctx5 with priority 1
    1 updater.go:193] evicting pod autoscale-tester-5d6b48d64f-8zgb9
    1 event.go:281] Event(v1.ObjectReference{Kind:"Pod", Namespace:"autoscale-tester", Name:"autoscale-tester-5d6b48d64f-8zgb9", UID:"ed8c54c7-a87a-4c39-a000-0e74245f18c6", APIVersion:"v1", ResourceVersion:"378376", FieldPath:""}): 
    type: 'Normal' reason: 'EvictedByVPA' Pod was evicted by VPA Updater to apply resource recommendation.

    You can also check the logs of vpa-admission-controller:

    kubectl logs -f vpa-admission-controller-bbf4f4cc7-cb6pb
    Sending patches: [{add /metadata/annotations map[]} {add /spec/containers/0/resources/requests/cpu 500m} {add /spec/containers/0/resources/requests/memory 500Mi} {add /spec/containers/0/resources/limits/cpu 500m} {add /spec/containers/0/resources/limits/memory 500Mi} {add /metadata/annotations/vpaUpdates Pod resources updated by autoscale-tester-recommender: container 0: cpu request, memory request, cpu limit, memory limit} {add /metadata/annotations/vpaObservedContainers autoscale-tester}]

    NOTE: Ensure that you have more than 1 running replicas. Otherwise, the pods won’t be restarted, and vpa-updater will give you this warning:

    1 pods_eviction_restriction.go:209] too few replicas for ReplicaSet autoscale-tester/autoscale-tester1-7698974f6. Found 1 live pods

    Now, describe the new pods created and check that the resources match the Target recommendations:

    kubectl get po
    NAME                            	READY   STATUS    	RESTARTS   AGE
    autoscale-tester-5d6b48d64f-5dlb7   1/1 	Running   	0      	77s
    autoscale-tester-5d6b48d64f-9wq4w   1/1 	Running   	0      	37s
    autoscale-tester-5d6b48d64f-qrlxn   1/1 	Running   	0      	17s
    
    
    kubectl describe po autoscale-tester-5d6b48d64f-5dlb7
    Name:     	autoscale-tester-5d6b48d64f-5dlb7
    Namespace:	autoscale-tester
    ...
    	Limits:
      	cpu: 	500m
      	memory:  500Mi
    	Requests:
      	cpu:    	500m
      	memory: 	500Mi
    	Environment:  <none>

    The Target Recommendation can not go below the minAllowed defined in the VPA spec.

    Fig:- Prometheus: Memory Usage Ratio

    E. Stress Loading Pods

    Let’s recreate the deployment with memory request and limit set to 2000Mi and “–vm-bytes”, “500M”.

    Gradually stress load one of these pods to increase its memory utilization.
    You can login to the pod and run stress –vm 1 –vm-bytes 1400M –timeout 120000s.

    
    kubectl top po
    NAME                            	CPU(cores)   MEMORY(bytes)   
    autoscale-tester-5d6b48d64f-5dlb7   1000m     	1836Mi       	 
    autoscale-tester-5d6b48d64f-9wq4w   252m      	501Mi       	 
    autoscale-tester-5d6b48d64f-qrlxn   252m      	501Mi 	

    Fig:- Prometheus memory utilized by each Replica

    You will notice that the VPA recommendation is also calculated accordingly and applied to all replicas.

    kubectl describe vpa autoscale-tester-recommender
    Name:     	autoscale-tester-recommender
    Namespace:	autoscale-tester
    ...
      Recommendation:
    	Container Recommendations:
      	Container Name:  autoscale-tester
      	Lower Bound:
        	Cpu: 	500m
        	Memory:  500Mi
      	Target:
        	Cpu: 	500m
        	Memory:  628694953
      	Uncapped Target:
        	Cpu: 	49m
        	Memory:  628694953
      	Upper Bound:
        	Cpu: 	500m
        	Memory:  1553712527

    Limits v/s Request
    VPA always works with the requests defined for a container and not the limits. So, the VPA recommendations are also applied to the container requests, and it maintains a limit to request ratio specified for all containers.

    For example, if the initial container configuration defines a 100m Memory Request and 300m Memory Limit, then when the VPA target recommendation is 150m Memory, the container Memory Request will be updated to 150m and Memory Limit to 450m.

    Selective Container Scaling

    If you have a pod with multiple containers and you want to opt-out some of them, you can use the “Off” mode to turn off recommendations for a container.

    You can also set containerName: “*” to include all containers.

    spec:
     targetRef:
       apiVersion: "apps/v1"
       kind:       Deployment
       name:       autoscale-tester
     updatePolicy:
       updateMode: "Auto"
     resourcePolicy:
       containerPolicies:
       - containerName: autoscale-tester
         minAllowed:
           cpu: "500m"
           memory: "500Mi"
         maxAllowed:
           cpu: "4"
           memory: "4Gi"
       - containerName: opt-out-container
         mode: "Off"

    Conclusion

    Both the Horizontal Pod Autoscaler and the Vertical Pod Autoscaler serve different purposes and one can be more useful than the other depending on your application’s requirement.

    The HPA can be useful when, for example, your application is serving a large number of lightweight (low resource-consuming) requests. In that case, scaling number of replicas can distribute the workload on each of the pod. The VPA, on the other hand, can be useful when your application serves heavyweight requests, which requires higher resources.

    Related Articles:

    1. A Practical Guide to Deploying Multi-tier Applications on Google Container Engine (GKE)

    2. Know Everything About Spinnaker & How to Deploy Using Kubernetes Engine