HorizontalPodAutoscaler Walkthrough (2024)

A HorizontalPodAutoscaler(HPA for short)automatically updates a workload resource (such asa Deployment orStatefulSet), with theaim of automatically scaling the workload to match demand.

Horizontal scaling means that the response to increased load is to deploy morePods.This is different from vertical scaling, which for Kubernetes would meanassigning more resources (for example: memory or CPU) to the Pods that are alreadyrunning for the workload.

If the load decreases, and the number of Pods is above the configured minimum,the HorizontalPodAutoscaler instructs the workload resource (the Deployment, StatefulSet,or other similar resource) to scale back down.

This document walks you through an example of enabling HorizontalPodAutoscaler toautomatically manage scale for an example web app. This example workload is Apachehttpd running some PHP code.

Before you begin

You need to have a Kubernetes cluster, and the kubectl command-line tool mustbe configured to communicate with your cluster. It is recommended to run this tutorial on a cluster with at least two nodes that are not acting as control plane hosts. If you do not already have acluster, you can create one by usingminikubeor you can use one of these Kubernetes playgrounds:

Your Kubernetes server must be at or later than version 1.23.To check the version, enter kubectl version.If you're running an olderrelease of Kubernetes, refer to the version of the documentation for that release (seeavailable documentation versions).

To follow this walkthrough, you also need to use a cluster that has aMetrics Server deployed and configured.The Kubernetes Metrics Server collects resource metrics fromthe kubelets in your cluster, and exposes those metricsthrough the Kubernetes API,using an APIService to addnew kinds of resource that represent metric readings.

To learn how to deploy the Metrics Server, see themetrics-server documentation.

If you are running Minikube, run the following command to enable metrics-server:

minikube addons enable metrics-server

Run and expose php-apache server

To demonstrate a HorizontalPodAutoscaler, you will first start a Deployment that runs a container using thehpa-example image, and expose it as a Serviceusing the following manifest:

apiVersion: apps/v1kind: Deploymentmetadata: name: php-apachespec: selector: matchLabels: run: php-apache template: metadata: labels: run: php-apache spec: containers: - name: php-apache image: registry.k8s.io/hpa-example ports: - containerPort: 80 resources: limits: cpu: 500m requests: cpu: 200m---apiVersion: v1kind: Servicemetadata: name: php-apache labels: run: php-apachespec: ports: - port: 80 selector: run: php-apache

To do so, run the following command:

kubectl apply -f https://k8s.io/examples/application/php-apache.yaml
deployment.apps/php-apache createdservice/php-apache created

Create the HorizontalPodAutoscaler

Now that the server is running, create the autoscaler using kubectl. Thekubectl autoscale subcommand,part of kubectl, helps you do this.

You will shortly run a command that creates a HorizontalPodAutoscaler that maintainsbetween 1 and 10 replicas of the Pods controlled by the php-apache Deployment thatyou created in the first step of these instructions.

Roughly speaking, the HPA controller will increase and decreasethe number of replicas (by updating the Deployment) to maintain an average CPU utilization across all Pods of 50%.The Deployment then updates the ReplicaSet - this is part of how all Deployments work in Kubernetes -and then the ReplicaSet either adds or removes Pods based on the change to its .spec.

Since each pod requests 200 milli-cores by kubectl run, this means an average CPU usage of 100 milli-cores.See Algorithm details for more detailson the algorithm.

Create the HorizontalPodAutoscaler:

kubectl autoscale deployment php-apache --cpu-percent=50 --min=1 --max=10
horizontalpodautoscaler.autoscaling/php-apache autoscaled

You can check the current status of the newly-made HorizontalPodAutoscaler, by running:

# You can use "hpa" or "horizontalpodautoscaler"; either name works OK.kubectl get hpa

The output is similar to:

NAME REFERENCE TARGET MINPODS MAXPODS REPLICAS AGEphp-apache Deployment/php-apache/scale 0% / 50% 1 10 1 18s

(if you see other HorizontalPodAutoscalers with different names, that means they already existed,and isn't usually a problem).

Please note that the current CPU consumption is 0% as there are no clients sending requests to the server(the TARGET column shows the average across all the Pods controlled by the corresponding deployment).

Increase the load

Next, see how the autoscaler reacts to increased load.To do this, you'll start a different Pod to act as a client. The container within the client Podruns in an infinite loop, sending queries to the php-apache service.

# Run this in a separate terminal# so that the load generation continues and you can carry on with the rest of the stepskubectl run -i --tty load-generator --rm --image=busybox:1.28 --restart=Never -- /bin/sh -c "while sleep 0.01; do wget -q -O- http://php-apache; done"

Now run:

# type Ctrl+C to end the watch when you're readykubectl get hpa php-apache --watch

Within a minute or so, you should see the higher CPU load; for example:

NAME REFERENCE TARGET MINPODS MAXPODS REPLICAS AGEphp-apache Deployment/php-apache/scale 305% / 50% 1 10 1 3m

and then, more replicas. For example:

NAME REFERENCE TARGET MINPODS MAXPODS REPLICAS AGEphp-apache Deployment/php-apache/scale 305% / 50% 1 10 7 3m

Here, CPU consumption has increased to 305% of the request.As a result, the Deployment was resized to 7 replicas:

kubectl get deployment php-apache

You should see the replica count matching the figure from the HorizontalPodAutoscaler

NAME READY UP-TO-DATE AVAILABLE AGEphp-apache 7/7 7 7 19m

Note:

It may take a few minutes to stabilize the number of replicas. Since the amountof load is not controlled in any way it may happen that the final number of replicaswill differ from this example.

Stop generating load

To finish the example, stop sending the load.

In the terminal where you created the Pod that runs a busybox image, terminatethe load generation by typing <Ctrl> + C.

Then verify the result state (after a minute or so):

# type Ctrl+C to end the watch when you're readykubectl get hpa php-apache --watch

The output is similar to:

NAME REFERENCE TARGET MINPODS MAXPODS REPLICAS AGEphp-apache Deployment/php-apache/scale 0% / 50% 1 10 1 11m

and the Deployment also shows that it has scaled down:

kubectl get deployment php-apache
NAME READY UP-TO-DATE AVAILABLE AGEphp-apache 1/1 1 1 27m

Once CPU utilization dropped to 0, the HPA automatically scaled the number of replicas back down to 1.

Autoscaling the replicas may take a few minutes.

Autoscaling on multiple metrics and custom metrics

You can introduce additional metrics to use when autoscaling the php-apache Deploymentby making use of the autoscaling/v2 API version.

First, get the YAML of your HorizontalPodAutoscaler in the autoscaling/v2 form:

kubectl get hpa php-apache -o yaml > /tmp/hpa-v2.yaml

Open the /tmp/hpa-v2.yaml file in an editor, and you should see YAML which looks like this:

apiVersion: autoscaling/v2kind: HorizontalPodAutoscalermetadata: name: php-apachespec: scaleTargetRef: apiVersion: apps/v1 kind: Deployment name: php-apache minReplicas: 1 maxReplicas: 10 metrics: - type: Resource resource: name: cpu target: type: Utilization averageUtilization: 50status: observedGeneration: 1 lastScaleTime: <some-time> currentReplicas: 1 desiredReplicas: 1 currentMetrics: - type: Resource resource: name: cpu current: averageUtilization: 0 averageValue: 0

Notice that the targetCPUUtilizationPercentage field has been replaced with an array called metrics.The CPU utilization metric is a resource metric, since it is represented as a percentage of a resourcespecified on pod containers. Notice that you can specify other resource metrics besides CPU. By default,the only other supported resource metric is memory. These resources do not change names from clusterto cluster, and should always be available, as long as the metrics.k8s.io API is available.

You can also specify resource metrics in terms of direct values, instead of as percentages of therequested value, by using a target.type of AverageValue instead of Utilization, andsetting the corresponding target.averageValue field instead of the target.averageUtilization.

 metrics: - type: Resource resource: name: memory target: type: AverageValue averageValue: 500Mi

There are two other types of metrics, both of which are considered custom metrics: pod metrics andobject metrics. These metrics may have names which are cluster specific, and require a moreadvanced cluster monitoring setup.

The first of these alternative metric types is pod metrics. These metrics describe Pods, andare averaged together across Pods and compared with a target value to determine the replica count.They work much like resource metrics, except that they only support a target type of AverageValue.

Pod metrics are specified using a metric block like this:

type: Podspods: metric: name: packets-per-second target: type: AverageValue averageValue: 1k

The second alternative metric type is object metrics. These metrics describe a differentobject in the same namespace, instead of describing Pods. The metrics are not necessarilyfetched from the object; they only describe it. Object metrics support target types ofboth Value and AverageValue. With Value, the target is compared directly to the returnedmetric from the API. With AverageValue, the value returned from the custom metrics API is dividedby the number of Pods before being compared to the target. The following example is the YAMLrepresentation of the requests-per-second metric.

type: Objectobject: metric: name: requests-per-second describedObject: apiVersion: networking.k8s.io/v1 kind: Ingress name: main-route target: type: Value value: 2k

If you provide multiple such metric blocks, the HorizontalPodAutoscaler will consider each metric in turn.The HorizontalPodAutoscaler will calculate proposed replica counts for each metric, and then choose theone with the highest replica count.

For example, if you had your monitoring system collecting metrics about network traffic,you could update the definition above using kubectl edit to look like this:

apiVersion: autoscaling/v2kind: HorizontalPodAutoscalermetadata: name: php-apachespec: scaleTargetRef: apiVersion: apps/v1 kind: Deployment name: php-apache minReplicas: 1 maxReplicas: 10 metrics: - type: Resource resource: name: cpu target: type: Utilization averageUtilization: 50 - type: Pods pods: metric: name: packets-per-second target: type: AverageValue averageValue: 1k - type: Object object: metric: name: requests-per-second describedObject: apiVersion: networking.k8s.io/v1 kind: Ingress name: main-route target: type: Value value: 10kstatus: observedGeneration: 1 lastScaleTime: <some-time> currentReplicas: 1 desiredReplicas: 1 currentMetrics: - type: Resource resource: name: cpu current: averageUtilization: 0 averageValue: 0 - type: Object object: metric: name: requests-per-second describedObject: apiVersion: networking.k8s.io/v1 kind: Ingress name: main-route current: value: 10k

Then, your HorizontalPodAutoscaler would attempt to ensure that each pod was consuming roughly50% of its requested CPU, serving 1000 packets per second, and that all pods behind the main-routeIngress were serving a total of 10000 requests per second.

Autoscaling on more specific metrics

Many metrics pipelines allow you to describe metrics either by name or by a set of additionaldescriptors called labels. For all non-resource metric types (pod, object, and external,described below), you can specify an additional label selector which is passed to your metricpipeline. For instance, if you collect a metric http_requests with the verblabel, you can specify the following metric block to scale only on GET requests:

type: Objectobject: metric: name: http_requests selector: {matchLabels: {verb: GET}}

This selector uses the same syntax as the full Kubernetes label selectors. The monitoring pipelinedetermines how to collapse multiple series into a single value, if the name and selectormatch multiple series. The selector is additive, and cannot select metricsthat describe objects that are not the target object (the target pods in the case of the Podstype, and the described object in the case of the Object type).

Autoscaling on metrics not related to Kubernetes objects

Applications running on Kubernetes may need to autoscale based on metrics that don't have an obviousrelationship to any object in the Kubernetes cluster, such as metrics describing a hosted service withno direct correlation to Kubernetes namespaces. In Kubernetes 1.10 and later, you can address this use casewith external metrics.

Using external metrics requires knowledge of your monitoring system; the setup issimilar to that required when using custom metrics. External metrics allow you to autoscale your clusterbased on any metric available in your monitoring system. Provide a metric block with aname and selector, as above, and use the External metric type instead of Object.If multiple time series are matched by the metricSelector,the sum of their values is used by the HorizontalPodAutoscaler.External metrics support both the Value and AverageValue target types, which function exactly the sameas when you use the Object type.

For example if your application processes tasks from a hosted queue service, you could add the followingsection to your HorizontalPodAutoscaler manifest to specify that you need one worker per 30 outstanding tasks.

- type: External external: metric: name: queue_messages_ready selector: matchLabels: queue: "worker_tasks" target: type: AverageValue averageValue: 30

When possible, it's preferable to use the custom metric target types instead of external metrics, since it'seasier for cluster administrators to secure the custom metrics API. The external metrics API potentially allowsaccess to any metric, so cluster administrators should take care when exposing it.

Appendix: Horizontal Pod Autoscaler Status Conditions

When using the autoscaling/v2 form of the HorizontalPodAutoscaler, you will be able to seestatus conditions set by Kubernetes on the HorizontalPodAutoscaler. These status conditions indicatewhether or not the HorizontalPodAutoscaler is able to scale, and whether or not it is currently restrictedin any way.

The conditions appear in the status.conditions field. To see the conditions affecting a HorizontalPodAutoscaler,we can use kubectl describe hpa:

kubectl describe hpa cm-test
Name: cm-testNamespace: promLabels: <none>Annotations: <none>CreationTimestamp: Fri, 16 Jun 2017 18:09:22 +0000Reference: ReplicationController/cm-testMetrics: ( current / target ) "http_requests" on pods: 66m / 500mMin replicas: 1Max replicas: 4ReplicationController pods: 1 current / 1 desiredConditions: Type Status Reason Message ---- ------ ------ ------- AbleToScale True ReadyForNewScale the last scale time was sufficiently old as to warrant a new scale ScalingActive True ValidMetricFound the HPA was able to successfully calculate a replica count from pods metric http_requests ScalingLimited False DesiredWithinRange the desired replica count is within the acceptable rangeEvents:

For this HorizontalPodAutoscaler, you can see several conditions in a healthy state. The first,AbleToScale, indicates whether or not the HPA is able to fetch and update scales, as well aswhether or not any backoff-related conditions would prevent scaling. The second, ScalingActive,indicates whether or not the HPA is enabled (i.e. the replica count of the target is not zero) andis able to calculate desired scales. When it is False, it generally indicates problems withfetching metrics. Finally, the last condition, ScalingLimited, indicates that the desired scalewas capped by the maximum or minimum of the HorizontalPodAutoscaler. This is an indication thatyou may wish to raise or lower the minimum or maximum replica count constraints on yourHorizontalPodAutoscaler.

Quantities

All metrics in the HorizontalPodAutoscaler and metrics APIs are specified usinga special whole-number notation known in Kubernetes as aquantity. For example,the quantity 10500m would be written as 10.5 in decimal notation. The metrics APIswill return whole numbers without a suffix when possible, and will generally returnquantities in milli-units otherwise. This means you might see your metric value fluctuatebetween 1 and 1500m, or 1 and 1.5 when written in decimal notation.

Other possible scenarios

Creating the autoscaler declaratively

Instead of using kubectl autoscale command to create a HorizontalPodAutoscaler imperatively wecan use the following manifest to create it declaratively:

apiVersion: autoscaling/v2kind: HorizontalPodAutoscalermetadata: name: php-apachespec: scaleTargetRef: apiVersion: apps/v1 kind: Deployment name: php-apache minReplicas: 1 maxReplicas: 10 metrics: - type: Resource resource: name: cpu target: type: Utilization averageUtilization: 50

Then, create the autoscaler by executing the following command:

kubectl create -f https://k8s.io/examples/application/hpa/php-apache.yaml
horizontalpodautoscaler.autoscaling/php-apache created
HorizontalPodAutoscaler Walkthrough (2024)
Top Articles
learn about medicare prescription coverage
Philasd Zimbra
Dragon Age Inquisition War Table Operations and Missions Guide
Tmf Saul's Investing Discussions
4-Hour Private ATV Riding Experience in Adirondacks 2024 on Cool Destinations
Faint Citrine Lost Ark
Lexington Herald-Leader from Lexington, Kentucky
Here's how eating according to your blood type could help you keep healthy
What Was D-Day Weegy
Our Facility
Palace Pizza Joplin
Housework 2 Jab
Restaurants Near Paramount Theater Cedar Rapids
Craigslist Apartments In Philly
Tcu Jaggaer
Byte Delta Dental
Finger Lakes Ny Craigslist
Dark Chocolate Cherry Vegan Cinnamon Rolls
We Discovered the Best Snow Cone Makers for Carnival-Worthy Desserts
Graphic Look Inside Jeffrey Dahmer
Shreveport City Warrants Lookup
Craigslist Maryland Trucks - By Owner
Directions To Nearest T Mobile Store
3569 Vineyard Ave NE, Grand Rapids, MI 49525 - MLS 24048144 - Coldwell Banker
Poochies Liquor Store
Stickley Furniture
Gopher Hockey Forum
In hunt for cartel hitmen, Texas Ranger's biggest obstacle may be the border itself (2024)
Funky Town Gore Cartel Video
Courtney Roberson Rob Dyrdek
Blush Bootcamp Olathe
Loopnet Properties For Sale
Was heißt AMK? » Bedeutung und Herkunft des Ausdrucks
Brenda Song Wikifeet
Tra.mypatients Folio
Plato's Closet Mansfield Ohio
Mississippi State baseball vs Virginia score, highlights: Bulldogs crumble in the ninth, season ends in NCAA regional
The Boogeyman Showtimes Near Surf Cinemas
Ishow Speed Dick Leak
Skyrim:Elder Knowledge - The Unofficial Elder Scrolls Pages (UESP)
craigslist | michigan
Tyler Perry Marriage Counselor Play 123Movies
Anhedönia Last Name Origin
SF bay area cars & trucks "chevrolet 50" - craigslist
Kutty Movie Net
Foxxequeen
[Teen Titans] Starfire In Heat - Chapter 1 - Umbrelloid - Teen Titans
Yale College Confidential 2027
tampa bay farm & garden - by owner "horses" - craigslist
Colin Donnell Lpsg
Gameplay Clarkston
Swissport Timecard
Latest Posts
Article information

Author: Geoffrey Lueilwitz

Last Updated:

Views: 6505

Rating: 5 / 5 (80 voted)

Reviews: 95% of readers found this page helpful

Author information

Name: Geoffrey Lueilwitz

Birthday: 1997-03-23

Address: 74183 Thomas Course, Port Micheal, OK 55446-1529

Phone: +13408645881558

Job: Global Representative

Hobby: Sailing, Vehicle restoration, Rowing, Ghost hunting, Scrapbooking, Rugby, Board sports

Introduction: My name is Geoffrey Lueilwitz, I am a zealous, encouraging, sparkling, enchanting, graceful, faithful, nice person who loves writing and wants to share my knowledge and understanding with you.