The Horizontal Pod Autoscaler is a Kubernetes resource controller that allows for automatic scaling of the number of pods in a replication controller, deployment, replica set or stateful set based on observed CPU utilization or with custom metrics support. Horizontal Pod Autoscaling on apply to objects that can be scaled. For objects that cannot be scaled like DaemonSets it cannot be used.

The Horizontal Pod Autoscaler is implemented as a Kubernetes API resource and a controller. The resource determines the behavior of the controller. The controller periodically adjusts the number of replicas in a replication controller or deployment to match the observed average CPU utilization to the target specified by user.

<img alt="" data-ezsrc="https://kirelos.com/wp-content/uploads/2020/08/echo/horizontal-pod-autoscaler-1024×768.png" data-ez ezimgfmt="rs rscb8 src ng ngcb8 srcset" height="768" loading="lazy" src="data:image/svg xml,” width=”1024″>

Before you can use Horizontal Pod Autoscaler on EKS Cluster you need to have installed Metrics Server. Follow the guide below for complete installation steps.

Install Kubernetes Metrics Server on Amazon EKS Cluster

Verify the metrics server is functional by using the command below.

$ kubectl get apiservice v1beta1.metrics.k8s.io -o yaml

apiVersion: apiregistration.k8s.io/v1
kind: APIService
metadata:
  annotations:
    kubectl.kubernetes.io/last-applied-configuration: |
      {"apiVersion":"apiregistration.k8s.io/v1beta1","kind":"APIService","metadata":{"annotations":{},"name":"v1beta1.metrics.k8s.io"},"spec":{"group":"metrics.k8s.io","groupPriorityMinimum":100,"insecureSkipTLSVerify":true,"service":{"name":"metrics-server","namespace":"kube-system"},"version":"v1beta1","versionPriority":100}}
  creationTimestamp: "2020-08-12T11:27:13Z"
  name: v1beta1.metrics.k8s.io
  resourceVersion: "130943"
  selfLink: /apis/apiregistration.k8s.io/v1/apiservices/v1beta1.metrics.k8s.io
  uid: 83c44e41-6346-4dff-8ce2-aff665199209
spec:
  group: metrics.k8s.io
  groupPriorityMinimum: 100
  insecureSkipTLSVerify: true
  service:
    name: metrics-server
    namespace: kube-system
    port: 443
  version: v1beta1
  versionPriority: 100
status:
  conditions:
  - lastTransitionTime: "2020-08-12T11:27:18Z"
    message: all checks passed
    reason: Passed
    status: "True"
    type: Available

Deploy sample app for testing HPA

Let’s deploy a test application that we’ll use to demonstrate the working of Horizontal Pod Autoscaler.

Create demo demo namespace:

$ kubectl create ns demo
namespace/demo created

$ kubectl get ns
NAME              STATUS   AGE
default           Active   2d20h
demo              Active   22s
kube-node-lease   Active   2d20h
kube-public       Active   2d20h
kube-system       Active   2d20h

Deploy a sample Apache web server application by running the following command in your terminal.

$ kubectl apply -f https://k8s.io/examples/application/php-apache.yaml -n demo
deployment.apps/php-apache created
service/php-apache created

You can also use kubectl run command to deploy the application and create a service.

$ kubectl run php-apache 
  --generator=run-pod/v1 
  --image=k8s.gcr.io/hpa-example 
  --requests=cpu=200m 
  --limits=cpu=500m 
  --expose 
  --port=80

Check the status of your application.

$ kubectl get pods -n demo
NAME                          READY   STATUS    RESTARTS   AGE
php-apache-79544c9bd9-wccnj   1/1     Running   0          40s

Create Kubernetes HPA resource

When the application is running we can create HPA resource.

$ kubectl autoscale deployment php-apache --cpu-percent=70 --min=1 --max=5 -n demo
horizontalpodautoscaler.autoscaling/php-apache autoscaled

The command above creates an autoscaler which scales up Pods when CPU utilization exceeds 70%. The minimum number of pods is set to 1 and Maximum is 5.

Get details of autoscaler with the following command:

$ kubectl get hpa -n demo
NAME         REFERENCE               TARGETS   MINPODS   MAXPODS   REPLICAS   AGE
php-apache   Deployment/php-apache   0%/70%    1         5         1          80s

$ kubectl describe hpa -n demo
Name:                                                  php-apache
Namespace:                                             demo
Labels:                                                
Annotations:                                           
CreationTimestamp:                                     Fri, 14 Aug 2020 21:38:12  0300
Reference:                                             Deployment/php-apache
Metrics:                                               ( current / target )
  resource cpu on pods  (as a percentage of request):  0% (1m) / 70%
Min replicas:                                          1
Max replicas:                                          5
Deployment pods:                                       1 current / 1 desired
Conditions:
  Type            Status  Reason               Message
  ----            ------  ------               -------
  AbleToScale     True    ScaleDownStabilized  recent recommendations were higher than current one, applying the highest recent recommendation
  ScalingActive   True    ValidMetricFound     the HPA was able to successfully calculate a replica count from cpu resource utilization (percentage of request)
  ScalingLimited  False   DesiredWithinRange   the desired count is within the acceptable range
Events:           

Increasing Load

Let us now increase the load by hitting the Service we deployed on Kubernetes from several locations. For this purpose we’re using busybox container to generate load.

kubectl run -it --rm load-generator --image=busybox /bin/sh --generator=run-pod/v1 -n demo

You’re be logged into the container terminal. Run the following commands to execute a while loop which hits service endpoint on http:///php-apache

/ # while true; do wget -q -O - http://php-apache; done

Open a separate terminal and see how the autoscaler creates more Pods in the deployment as the load increases.

$ kubectl get hpa -n demo
NAME         REFERENCE               TARGETS   MINPODS   MAXPODS   REPLICAS   AGE
php-apache   Deployment/php-apache   83%/70%   1         5         5          9m

As long as actual CPU percentage is higher than the target percentage, then the replica count increases, up to 5. In this case, it’s 83%, so the number of REPLICAS continues to increase.

Stop the load using CTRL C

Watch as autoscaler scales down deployment:

$ kubectl get hpa -n demo -w 

It may take some minutes before running Pods drop back to 1. Clean the setup once done.

$ kubectl delete -f https://k8s.io/examples/application/php-apache.yaml -n demo
deployment.apps "php-apache" deleted
service "php-apache" deleted

Delete Autoscaler.

$ kubectl delete hpa php-apache -n demo
horizontalpodautoscaler.autoscaling "php-apache" deleted

Lastly delete the demo namespace.

$ kubectl delete ns demo
namespace "demo" deleted

You’ll use the same approach to autoscale your Applications with HPA using Metrics Server.

More articles on Kubernetes:

Enable CloudWatch logging in EKS Kubernetes Cluster

Ceph Persistent Storage for Kubernetes with Cephfs

How To Create Admin User to Access Kubernetes Dashboard