CKAD theory pt.4 - Observability

4 minute read

This is part 4 of my personal notes I’ve written during preparation to CKAD.

Multi-Container Pods

Understand LivenessProbes and ReadinessProbes
Understand container logging
Monitoring applications
Debugging

Liveness and Readiness Probes

Probes allow you to change how k8s treats state of your containers. Using probes, we can change when k8s starts sending traffic to container and when container is considered dead and should be restarted. You can change what is checked and how often it’s checked using specific handlers.

There are three types of checks that k8s can perform:

ExecAction - execute specific command inside the pod. Exit code should be 0.
TCPSocketAction - perform TCP check on a specified container’s port. Port should be open
HTTPGetAction - perform HTTP GET request against container on a specified port and path. For check to be considered successfull, exit code should be between 200 and 400.

Liveliness probe

livenessProbe - performs sanity check, that application is running and in normal operational state. If check is failed, kubelet will kill the Container.

spec:
  containers:
    - name: nginx-1
      image: nginx
      livenessProbe:
        httpGet:
          path: /healthy.html
          port: 80

Readiness probe

readinessProbe - performs ready to serve check, that application is ready to handle requests. When readiness probe succeeds, container’s IP address is added to Service Endpoints.

spec:
  containers:
    - name: nginx-1
      image: nginx
      readinessProbe:
        httpGet:
          path: /index.html
          port: 80

Probe options

Probes have a number of options that can change their behaviour:

initialDelaySecond - number of seconds to wait before start running periodic checks
periodSeconds - how often(in seconds) to perform the check. Default=10, min=1
timeoutSeconds - number of seconds after probe times out. Default=1, min=1
successThreshold - minimum consecutive successes for the probe to be considered successful after fail. Default=1,min=1, must be 1 for liveness
failureThreshold - minimum consecutive failures for the probe to be considered failed. For liveness=restart, readiness=unready. Default=3,min=1

HTTP probe options:

host - defaults to pod IP.
scheme - http/https. defaults to http
path - path in url
httpHeaders - custom headers set in request
port - between 1 and 65535

Container logging

In k8s it is expected that application will put their logs to stdout. This output is captured by k8s and can be accessed using kubectl logs <container>. In case there are many containers in the pod, use kubectl logs -c <container>.

Important thing to notice that k8s doesn’t keep the logs if a container crashes, pod is evicted or a node dies - it’s expected that you’ll use external solution for that. Main variants are:

Elasticsearch and Kibana
StackDriver - GKE default
Azure Monitor - Azure default

Monitoring applications

The Kubernetes metrics server provides an API which allows you to access data about your pods and nodes, such as CPU and memory usage. By default, k8s does not have metrics server installed and you have to install it by your own.

Metrics server provide a limited set of metrics but it’s light-weight.

For a complete rich set of metrics consider installing Prometheus.

Installing Metrics Server

One way to install k8s metrics server is to use following commands:

DOWNLOAD_URL=$(curl --silent "https://api.github.com/repos/kubernetes-incubator/metrics-server/releases/latest" | jq -r .tarball_url)
DOWNLOAD_VERSION=$(grep -o '[^/v]*$' <<< $DOWNLOAD_URL)
curl -Ls $DOWNLOAD_URL -o metrics-server-$DOWNLOAD_VERSION.tar.gz
mkdir metrics-server-$DOWNLOAD_VERSION
tar -xzf metrics-server-$DOWNLOAD_VERSION.tar.gz --directory metrics-server-$DOWNLOAD_VERSION --strip-components 1
kubectl apply -f metrics-server-$DOWNLOAD_VERSION/deploy/1.8+/

Give 5 minutes to deploy and verify that metrics is running by issuing following command:

kubectl get --raw /apis/metrics.k8s.io/

Monitoring application utilization

With metric server up and running, you can use following commands:

kubectl top nodes - show information about CPU/Memory utilization across nodes in your cluster
kubectl top pods - show information about all your pods in default namespace
kubectl top pod - show information about single pod

Debugging

Debugging an application consists of 2 steps:

Find the issue
Fix the issue

When application is not working or is not working as expected, start finding the issue by checking PODs, Replication Controllers and Services. It’s also worth checking nodes utilization, volumes and port bindings.

According to a survey, top 10 reasons for POD to fail:

Wrong Container Image / Invalid Registry Permissions
- The image tag is incorrect
- The image doesn’t exist
- Kubernetes doesn’t have permissions to pull that image
- Errors: ErrImagePull or ImagePullBackOff
Application Crashing after Launch
- Incorrect permissions in the POD
- Missing Environment variables
- Missing mounted volumes
- Errors: CrashLoopBackOff
Missing ConfigMap or Secret
- Errors: RunContainerError, hung in ContainerCreating
Liveness/Readiness Probe Failure
- Incorrect health URL
- Incorrect startup timeout
- Database misconfiguration
- Errors: NotReady, Terminated
Exceeding CPU/Memory Limits
- Errors: FailedCreate
Resource Quotas
- Errors: FailedCreate, Terminated
Insufficient Cluster Resources
- Even if resources no fully utilized they can be fully accounted by the Scheduler
- Errors: FailedScheduling
PersistentVolume fails to mount
- No free PV
- PV not accessible by the POD
- Errors: Pod hung in ContainerCreating
Validation Errors
- Syntax errors in object definition
- Incorrect indentation
Container Image not updating
- Incorrect Image pull policy i.e. IfNotPresent results in containers not being pulled if same tag exists on the node

Debugging a POD

Most important command to start your POD troubleshooting:

kubectl get pods - show basic POD info. Worth comining with -o wide and -o yaml flags
kubectl describe pod - provide detailed information on POD and latest events.
kubectl logs <podname> - print stdout from the POD.

When you’ve found an issue, you can modify object on-the-fly by issuing kubectl edit command. However, not all object can be modified in memory. Though deprecated, kubectl get --export can provide a clean YAML that can be reapplied later.

Debugging a Service

To troubleshoot a Service, use following commands:

kubectl get service
kubectl describe endpoints

Sergei Bulavintsev

CKAD theory pt.4 - Observability

Multi-Container Pods

Liveness and Readiness Probes

Liveliness probe

Readiness probe

Probe options

Container logging

Monitoring applications

Installing Metrics Server

Monitoring application utilization

Debugging

Debugging a POD

Debugging a Service

Links

Share on

Comments

You May Also Enjoy

Obsidian Sync Between Devices Using Nextcloud

(Re)Packaging Linux/Mac Kontur Talk Application with Nix

Understanding ZFS-ZED Service Failures on NixOS

Using renovate to update Nix flakes