Advanced Features of Kubernetes' Horizontal Pod Autoscaler (2024)

Most people who use Kubernetes know that you can scale applications using Horizontal Pod Autoscaler (HPA) based on their CPU or memory usage. There are however many more features of HPA that you can use to customize scaling behaviour of your application, such as scaling using custom application metrics or external metrics, as well as alpha/beta features like "scaling to zero" or container metrics scaling.

So, in this article we will explore all of these options so that we can take full advantage of all available features of HPA and to get a head start on the features that are coming in future Kubernetes releases.

Setup

Before we get started with scaling, we first need a testing environment. For that we will use KinD (Kubernetes in Docker) cluster defined by the following YAML:

# cluster.yamlkind: ClusterapiVersion: kind.x-k8s.io/v1alpha4featureGates: HPAScaleToZero: true HPAContainerMetrics: true LogarithmicScaleDown: truenodes:- role: control-plane- role: worker- role: worker- role: worker

This manifest configures the KinD cluster with 1 control plane node and 3 workers, additionally it enables a couple of feature gates related to autoscaling. These feature gates will later allow us to use some alpha/beta features of HPA. To create a cluster with the above configuration, you can run:

kind create cluster --config ./cluster.yaml --name autoscaling --image=kindest/node:v1.23.6

Apart from the cluster, we will also need an application that we will scale. For that we will use resource consumer tool and it's image, which are used in Kubernetes end-to-end testing. To deploy it, you can run:

kubectl create deployment resource-consumer --image=gcr.io/k8s-staging-e2e-test-images/resource-consumer:1.11kubectl set resources deployment resource-consumer --requests=cpu=500m,memory=256Micat <<EOF | kubectl apply -f -apiVersion: v1kind: Servicemetadata: labels: app: resource-consumer name: resource-consumer namespace: defaultspec: ports: - name: http port: 8080 protocol: TCP targetPort: 8080 selector: app: resource-consumerEOF

This application is very handy in this situation, as it allows us to simulate CPU and memory consumption of a Pod. It can also expose custom metrics which are needed for scaling based on custom/external metrics. To test this out we can run:

# Consume CPU (300m for 10min):kubectl run curl --image=curlimages/curl:7.83.1 \ --rm -it --restart=Never -- \ curl --data "millicores=300&durationSec=600" http://resource-consumer:8080/ConsumeCPU# Expose metric "custom_metric" with value 100 for 10min at endpoint /metricskubectl run curl --image=curlimages/curl:7.83.1 --rm \ -it --restart=Never -- \ curl --data "metric=custom_metric&delta=100&durationSec=600" http://resource-consumer:8080/BumpMetric

Next, we will also need to deploy services that collect metrics based on which we will later scale our test application. First of these is Kubernetes metrics-server which is usually available in cluster by default, but that's not the case in KinD, so to deploy it we need to run:

kubectl apply -f https://github.com/kubernetes-sigs/metrics-server/releases/download/v0.5.0/components.yamlkubectl patch -n kube-system deployment metrics-server --type=json \ -p '[{"op":"add","path":"/spec/template/spec/containers/0/args/-","value":"--kubelet-insecure-tls"}]'

metrics-server allows us to monitor for basic metrics such as CPU and memory usage, but we also want to implement scaling based on custom metrics, such as the ones exposed by an application on its /metrics endpoint, or even external ones like queue depth of a queue running outside of cluster. For these we will need:

You can refer to the end-to-end walkthrough for more details of the setup.

The above requires a lot of setup, so for purpose of this article and for your convenience, I've made a script and a set manifests that you can use to spin up KinD cluster along with all the required components. All you need to do is run setup.sh script from this repository.

After running the script, we can verify that everything is ready using following commands:

# To verify availability of metrics run:kubectl top nodes# NAME CPU(cores) CPU% MEMORY(bytes) MEMORY%# autoscaling-control-plane 113m 0% 1024Mi 1%# autoscaling-worker 49m 0% 385Mi 0%# autoscaling-worker2 42m 0% 381Mi 0%# autoscaling-worker3 37m 0% 276Mi 0%kubectl get --raw "/apis/metrics.k8s.io/v1beta1/nodes" | jq . # also works with "pods"# ...# {# "metadata": {# "name": "autoscaling-worker3",# "labels": { ... }# },# "window": "20s",# "usage": {# "cpu": "43077193n",# "memory": "283212Ki"# }# To query/verify custom metrics:kubectl get --raw "/apis/custom.metrics.k8s.io/v1beta1" | jq . # also works for "external" instead of "custom"# ...# "name": "pods/custom_metric",# "singularName": "",# "namespaced": true,# "kind": "MetricValueList",# "verbs": [ "get" ]# ...kubectl get --raw "/apis/custom.metrics.k8s.io/v1beta1/namespaces/default/pods/*/custom_metric" | jq .# ...# {# "kind": "MetricValueList",# "apiVersion": "custom.metrics.k8s.io/v1beta1",# "metadata": {"selfLink": "/apis/custom.metrics.k8s.io/v1beta1/namespaces/default/pods/%2A/custom_metric"},# "items": [{# "describedObject": {# "kind": "Pod",# "namespace": "default",# "name": "resource-consumer-6bf5898d6f-gzzgm",# "apiVersion": "/v1"# },# "metricName": "custom_metric", "value": "100",# }]}

More helpful commands can be found in output of above mentioned script or in the repository README.

Basic Autoscaling

Now that we have our infrastructure up-and-running, we can start scaling the test application. The simplest way to do so is to create HPA using command like kubectl autoscale deploy resource-consumer --min=1 --max=5 --cpu-percent=75, this however creates HPA with apiVersion of autoscaling/v1, which lacks most of the features.

So, instead, we will create the HPA with YAML, specifying autoscaling/v2 as a apiVersion:

apiVersion: autoscaling/v2kind: HorizontalPodAutoscalermetadata: name: resource-consumer-v2spec: scaleTargetRef: apiVersion: apps/v1 kind: Deployment name: resource-consumer minReplicas: 1 maxReplicas: 5 metrics: - type: Resource resource: name: cpu target: type: Utilization averageUtilization: 75 - type: Resource resource: name: memory target: type: AverageValue averageValue: 200Mi

The above HPA will use basic metrics gathered from application Pod(s) by metrics-server. To test out the scaling we can simulate heavy memory usage:

kubectl run curl --image=curlimages/curl:7.83.1 \ --rm -it --restart=Never -- \ curl --data "megabytes=500&durationSec=600" http://resource-consumer:8080/ConsumeMemkubectl get hpa -w# NAME REFERENCE TARGETS MINPODS MAXPODS REPLICAS AGE# resource-consumer-v2 Deployment/resource-consumer 4689920/200Mi, 0%/75% 1 5 1 81s# resource-consumer-v2 Deployment/resource-consumer 530415616/200Mi, 0%/75% 1 5 1 2m23s# resource-consumer-v2 Deployment/resource-consumer 265820160/200Mi, 0%/75% 1 5 3 2m31s# resource-consumer-v2 Deployment/resource-consumer 212226867200m/200Mi, 0%/75% 1 5 5 5m50s

Custom Metrics

Scaling based on CPU and memory usage is often enough, but we're after the advanced scaling options. First of them is scaling using custom metrics exposed by an application:

apiVersion: autoscaling/v2kind: HorizontalPodAutoscalermetadata: name: resource-consumer-v2-customspec: scaleTargetRef: apiVersion: apps/v1 kind: Deployment name: resource-consumer minReplicas: 1 maxReplicas: 5 metrics: # kubectl get --raw "/apis/custom.metrics.k8s.io/v1beta1/namespaces/default/pods/*/custom_metric" | jq . - type: Pods pods: metric: name: custom_metric target: type: AverageValue averageValue: 100

This HPA is configured to scale the application based on the value of custom_metric that was scraped by Prometheus from application's /metrics endpoint. This will scale the application up if average value of specified metric across all pods (.target.type: AverageValue) goes over 100.

The above uses Pod metric to scale, but it's possible to specify any other object which has a metric attached to itself:

# ... - type: Object object: metric: name: custom_metric describedObject: apiVersion: v1 kind: Service name: resource-consumer target: type: Value value: 100

This snippet achieves the same as the previous one, this time however, using Service instead of Pod as the source of the metric. It also shows that you can use direct comparison to measure the scaling threshold by setting .target.type to Value instead of AverageValue.

To figure out which objects expose metrics that you can use in scaling, you can traverse the API using kubectl get --raw. For example to look up the custom_metric for either Pod or Service you can use:

# Podkubectl get --raw "/apis/custom.metrics.k8s.io/v1beta1/namespaces/default/pods/*/custom_metric" | jq .# Servicekubectl get --raw "/apis/custom.metrics.k8s.io/v1beta1/namespaces/default/services/resource-consumer/custom_metric" | jq .# Everythingkubectl get --raw "/apis/custom.metrics.k8s.io/v1beta1/" | jq .

Also, to help you troubleshoot, the HPA object provides a status stanza, that shows whether the applied metric was recognized:

kubectl get hpa resource-consumer-v2-custom -o json | jq .status.conditions[... { "lastTransitionTime": "2022-05-17T12:36:03Z", "message": "the HPA was able to successfully calculate a replica count from pods metric custom_metric", "reason": "ValidMetricFound", "status": "True", "type": "ScalingActive" },...]

Finally, to test out the behavior of the above HPA, we can bump the metric exposed by the application and see how the application scales up:

# Raise custom_metric to 150kubectl run curl --image=curlimages/curl:7.83.1 \ --rm -it --restart=Never -- curl \ --data "metric=custom_metric&delta=150&durationSec=600" http://resource-consumer:8080/BumpMetrickubectl get hpa -w# NAME REFERENCE TARGETS MINPODS MAXPODS REPLICAS AGE# resource-consumer-v2-custom Deployment/resource-consumer 0/100 1 5 1 10s# resource-consumer-v2-custom Deployment/resource-consumer 150/100 1 5 1 24s# resource-consumer-v2-custom Deployment/resource-consumer 150/100 1 5 2 40s# resource-consumer-v2-custom Deployment/resource-consumer 150/100 1 5 5 75s

External Metrics

To show full potential of HPA, we will also try scaling an application based on external metric. This would require us to scrape metrics from external system running outside of a cluster, such Kafka or PostgreSQL. We don't have that available, so instead we've configured Prometheus Adapter to treat certain metrics as external. The configuration that does this can be found [here](https://github.com/MartinHeinz/metrics-on-kind/blob/master/custom-metrics-config-map.yaml). All you need to know though is that with this test cluster, any application metrics prefixed with external will go to external metrics API. To test this out, we bump up such a metric and check if the API gets populated:

# Set external_queue_messages_ready to 150 for 10minkubectl run curl --image=curlimages/curl:7.83.1 \ --rm -it --restart=Never -- \ curl --data "metric=external_queue_messages_ready&delta=150&durationSec=600" \ http://resource-consumer:8080/BumpMetrickubectl get --raw /apis/external.metrics.k8s.io/v1beta1/namespaces/default/external_queue_messages_ready | jq .{ "kind": "ExternalMetricValueList", "apiVersion": "external.metrics.k8s.io/v1beta1", "items": [ { "metricName": "external_queue_messages_ready", "value": "150" } ]}

To then scale our deployment based on this metric we can use following HPA:

apiVersion: autoscaling/v2kind: HorizontalPodAutoscalermetadata: name: resource-consumer-v2-externalspec:# ... metrics: - type: External external: metric: name: external_queue_messages_ready target: type: Value value: 100

HPAScaleToZero

Now that we've gone through all the well known features of HPA, let's also take a look at the alpha/beta ones that we enabled using feature gates. First one being HPAScaleToZero.

As the name suggests, this will allow you to set minReplicas in HPA to zero, effectively turning the service off if there's no traffic. This can be useful in "bursty" workflow, for example in case where your application receives data from an external queue. In this use case the application can be safely scaled to zero when there are messages waiting to be processed.

With the feature gate enabled we can simply run:

kubectl patch hpa resource-consumer-v2-external -p '{"spec":{"minReplicas": 0}}'

Which sets the minimum replicas of previously shown HPA to zero.

Be aware though, that this will only work for metrics of type External or Object.

HPAContainerMetrics

Another feature gate that we can make use of is HPAContainerMetrics which allows us to use metrics of type: ContainerResource:

apiVersion: autoscaling/v2kind: HorizontalPodAutoscalermetadata: name: resource-consumer-v2-containerspec:# ... metrics: - type: ContainerResource containerResource: name: cpu container: resource-consumer target: type: Utilization averageUtilization: 75

This makes it possible to scale based on resource utilization of individual containers rather than whole Pod. This can be useful if you have multi-container Pod with application container and sidecar, and you want to ignore the sidecar and scale the deployment only based on the application container.

You can also view the breakdown of Pod/container metrics by running the following command:

POD=$(kubectl get pod -l app=resource-consumer -o jsonpath="{.items[0].metadata.name}")kubectl get --raw "/apis/metrics.k8s.io/v1beta1/namespaces/default/pods/$POD" | jq .{ "kind": "PodMetrics", "apiVersion": "metrics.k8s.io/v1beta1", "metadata": { "name": "resource-consumer-6bf5898d6f-gzzgm", "namespace": "default", }, "window": "16s", "containers": [{ "name": "resource-consumer", "usage": { "cpu": "0", "memory": "11028Ki" }}]}

LogarithmicScaleDown

Last but not least is LogarithmicScaleDown feature flag.

Without this feature, the Pod that's been running for least amount of time gets deleted first during downscaling. That's not always ideal though as it can create imbalance in replica distribution because newer Pods tend serve less traffic than the older ones.

With this feature flag enabled, a semi-random selection of Pods will be used instead when selecting Pod to be deleted.

For a full rationale and algorithm details see KEP-2189.

Closing Thoughts

In this article, I tried to cover most of the things you can do with Kubernetes HPA to scale your application. There are however, many more tools and options for scaling applications running in Kubernetes, such as vertical pod autoscaler which can help to keep Pod resource requests and limits up-to-date.

Another option would be predictive HPA by Digital Ocean, which will try to predict how many replicas a resource should and application have.

Finally, autoscaling doesn't end with Pods - next step after setting up Pod autoscaling is to also set up cluster autoscaling to avoid running out of available resources in you whole cluster.

Advanced Features of Kubernetes' Horizontal Pod Autoscaler (2024)
Top Articles
South Dakota Land & Lots For Sale - 2268 Listings | Zillow
South Dakota Houses With Land for Sale - 197 Properties
Craigslist Myrtle Beach Motorcycles For Sale By Owner
Tattoo Shops Lansing Il
Foxy Roxxie Coomer
Form V/Legends
Online Reading Resources for Students & Teachers | Raz-Kids
Gabriel Kuhn Y Daniel Perry Video
Ds Cuts Saugus
Teenbeautyfitness
Flights to Miami (MIA)
Truist Drive Through Hours
Locate Td Bank Near Me
Ncaaf Reference
Ap Chem Unit 8 Progress Check Mcq
Cincinnati Bearcats roll to 66-13 win over Eastern Kentucky in season-opener
Ladyva Is She Married
5808 W 110Th St Overland Park Ks 66211 Directions
How to Store Boiled Sweets
Hartland Liquidation Oconomowoc
Magic Mike's Last Dance Showtimes Near Marcus Cedar Creek Cinema
Bank Of America Financial Center Irvington Photos
The Menu Showtimes Near Regal Edwards Ontario Mountain Village
Officialmilarosee
Terry Bradshaw | Biography, Stats, & Facts
Pocono Recird Obits
Local Collector Buying Old Motorcycles Z1 KZ900 KZ 900 KZ1000 Kawasaki - wanted - by dealer - sale - craigslist
Divide Fusion Stretch Hoodie Daunenjacke für Herren | oliv
Mini-Mental State Examination (MMSE) – Strokengine
Vadoc Gtlvisitme App
Rugged Gentleman Barber Shop Martinsburg Wv
How Much Is An Alignment At Costco
123Moviestvme
Beaver Saddle Ark
Rocketpult Infinite Fuel
Rise Meadville Reviews
Go Smiles Herndon Reviews
Babylon 2022 Showtimes Near Cinemark Downey And Xd
Raisya Crow on LinkedIn: Breckie Hill Shower Video viral Cucumber Leaks VIDEO Click to watch full…
Red Dead Redemption 2 Legendary Fish Locations Guide (“A Fisher of Fish”)
303-615-0055
O'reilly's Palmyra Missouri
Tableaux, mobilier et objets d'art
Grizzly Expiration Date Chart 2023
Enr 2100
Tom Kha Gai Soup Near Me
Rheumatoid Arthritis Statpearls
Join MileSplit to get access to the latest news, films, and events!
Aaca Not Mine
Lagrone Funeral Chapel & Crematory Obituaries
Www.card-Data.com/Comerica Prepaid Balance
Noaa Duluth Mn
Latest Posts
Article information

Author: Tuan Roob DDS

Last Updated:

Views: 6509

Rating: 4.1 / 5 (42 voted)

Reviews: 81% of readers found this page helpful

Author information

Name: Tuan Roob DDS

Birthday: 1999-11-20

Address: Suite 592 642 Pfannerstill Island, South Keila, LA 74970-3076

Phone: +9617721773649

Job: Marketing Producer

Hobby: Skydiving, Flag Football, Knitting, Running, Lego building, Hunting, Juggling

Introduction: My name is Tuan Roob DDS, I am a friendly, good, energetic, faithful, fantastic, gentle, enchanting person who loves writing and wants to share my knowledge and understanding with you.