Troubleshooting Kinetic Platform Installations in a Kubernetes Environment

Overview

This guide covers common questions and troubleshooting scenarios for working with the Kinetic Platform and its associated Kubernetes environment.

Common Concepts

Before you begin, ensure you have kubectl installed and configured. kubectl is a command line tool for interacting with the Kubernetes Control Plane, which manages infrastructure deployment into Kubernetes. You'll use kubectl to connect to the Kubernetes pod that runs the Platform.

Here are a few definitions of Kubernetes functions and how they relate to a Kinetic Platform installation:

  • Namespace: Namespaces provide network separation and are used to group resources. Your company's Spaces are all stored in a particular namespace.
  • Pod: Pods are the smallest deployable units that can be created in Kubernetes. They are typically not deployed directly; instead, they are usually deployed via workload resources such as Deployments. A pod can comprise one or more containers.
  • Container: Containers live inside pods. They host Kinetic system components such as Core, Task, and Indexer.
  • Deployment: Used to group and manage pods. Agent deployers hold specifications for a pod. There are other resources for pod management, but deployments are most often used for troubleshooting.
  • Secrets: Secrets are values a container requires that can't be stored in plain text. Secrets are created separately and then stored in a container.
  • Config maps: Files that store configuration information that is safe to store in plain text. Config maps are often preloaded into the Kinetic applications.

Troubleshooting Process

The most common workflow for troubleshooting a Kinetic installation follows these steps:

  1. Obtaining resource information.
  2. Obtaining and reviewing log files.
  3. Debugging the affected resource using kubectl commands like logs, exec, describe, and get. The Common Kubernetes Commands guide includes some frequently-used commands.

Because every environment is different, there are no standard instructions for resolving a specific issue. This guide is designed to help you narrow down potential causes and, if necessary, gather the information Kinetic Support needs to help resolve your issue.

Obtaining Resource Information

To begin the troubleshooting process, determine the state of the resources by running the get pods command. This will return information such as volumes mounted on a pod.

kubectl get pods -owide

Two columns are of particular importance: Ready and Status. Ready shows the number of ready pod instances and the total number of pod instances separated by a slash (/). If there are fewer ready instances than total instances, this can indicate there is an issue with the listed pod.

Obtaining and Reviewing Log Files

Example: Following the Core Pod Logs

When troubleshooting issues with the Core service in Kinetic Data, following the application.log file within the Core pod in Kubernetes can sometimes be helpful. From there, we can perform an action that interacts with the Core service and then watch the logs in real time.

We can do this in a couple of different ways:

Method 1: Shell into the Core Pod and Follow the Logs

# Get all of the pods running in the kinetic namespace
kubectl get pods -n kinetic

# --- Output --- (shorted for brevity, other pods are also running)
# NAME                                       READY   STATUS      RESTARTS       AGE
# agent-7bbb8cb5d5-6r8fp                     2/2     Running     0              1d
# core-8b554599c-5g85f                       2/2     Running     0              1d
# ...

# Describe the core pod
kubectl describe pod core-8b554599c-5g85f -n kinetic

# --- Output (shorted for brevity, to show the containers running on the pod) ---
# ...
# Containers:
#   nginx:
#     ...
#   core:
# 	  ...

# Shell into the core container, running on the core pod, 
#   this will put us into the container.
# The "output" will be a new shell but from within the container
kubectl exec -it -c core -n kinetic core-8b554599c-5g85f -- bash

# Change Directory into the /app/core/logs/ folder
cd /app/core/logs

# Follow the application.log file
# Ctrl-C to stop following the file at any time
tail -f application.log

# To exit the session
exit

Method 2: Follow the application.log file Without Shelling Into the Container

# Get all of the pods running in the kinetic namespace
kubectl get pods -n kinetic

# --- Output --- (shorted for brevity, other pods are also running)
# NAME                                       READY   STATUS      RESTARTS       AGE
# agent-7bbb8cb5d5-6r8fp                     2/2     Running     0              1d
# core-8b554599c-5g85f                       2/2     Running     0              1d
# ...

# Describe the core pod
kubectl describe pod core-8b554599c-5g85f -n kinetic

# --- Output (shorted for brevity, to show the containers running on the pod) ---
# ...
# Containers:
#   nginx:
#     ...
#   core:
# 	  ...

# Execute the tail command without accessing the shell within the core pod/container
# Ctrl-C will to stop watching the log files anytime
kubectl exec -it -c core -n kinetic core-8b554599c-5g85f -- tail -f /app/core/logs/application.log

Copying the application.log File from the Core pod

It can be helpful to copy the application.log file from the Core pod in Kubernetes. This lets you look at the application logs from the Core service or send the log file to Kinetic Data Support for assistance with troubleshooting.

# Get all of the pods running in the kinetic namespace
kubectl get pods -n kinetic

# --- Output --- (shorted for brevity, other pods are also running)
# NAME                                       READY   STATUS      RESTARTS       AGE
# agent-7bbb8cb5d5-6r8fp                     2/2     Running     0              1d
# core-8b554599c-5g85f                       2/2     Running     0              1d
# ...

# Describe the core pod
kubectl describe pod core-8b554599c-5g85f -n kinetic

# --- Output (shorted for brevity, to show the containers running on the pod) ---
# ...
# Containers:
#   nginx:
#     ...
#   core:
# 	  ...

# Copy the application.log file to my home directory on the server I am on
kubectl cp -n kinetic -c core core-8b554599c-5g85f:/app/core/logs/application.log ~/application.log

# Open the file in vi for viewing
# (or, cat the file)
vi ~/application.log
cat ~/application.log

Connecting to a Container

You can run the first command to shell into a container within a pod (if the container has a shell available).

kubectl exec -it <POD> -n <NAMESPACE> -c <CONTAINER> -- sh

Alternatively, you can run the following command to use a bash:

kubectl exec -it <POD> -n <NAMESPACE> -c <CONTAINER> -- bash

Debugging

Restarting the Core Deployment

It can be helpful to restart the Core deployment as part of the troubleshooting process (for example, if Core is behaving abnormally, a restart may fix the problem).

# Get the pods running in the kinetic namespace
# While this isn't necessary for a restart, we may want to verify first the core pods are running
# We pipe the output to grep to find only the core pods
kubectl get pods -n kinetic | grep core

# --- Output ---
# core-8b554599c-5g85f                       2/2     Running     0              1d

# Get our list of deployments
kubectl get deployment -n kinetic

# --- Output ---
# NAME                      READY   UP-TO-DATE   AVAILABLE   AGE
# ...
# core                      1/1     1            1           1d
# ...

# Rolling restart the core deployment
# Kubernetes will spin up replacement pods before terminating the old pods
# to keep availability
kubectl rollout restart deployment core -n kinetic

# --- Output ---
# deployment.apps/core restarted

# Check the status of the rollout restart
# The command will automatically stop when the restart is complete
# Hit Ctrl-C to stop watching for the restart to complete (this will not stop the restart)
kubectl rollout status deployment core -n kinetic

# --- Output ---
# Waiting for deployment "core" rollout to finish: 1 out of 2 new replicas have been updated...
# Waiting for deployment "core" rollout to finish: 1 out of 2 new replicas have been updated...
# Waiting for deployment "core" rollout to finish: 1 out of 2 new replicas have been updated...
# Waiting for deployment "core" rollout to finish: 1 old replicas are pending termination...
# Waiting for deployment "core" rollout to finish: 1 old replicas are pending termination...
# deployment "core" successfully rolled out

Appendix: Pod Statuses

StatusMeaning
PendingThe Kubernetes system has accepted the pod, but one or more containers have not been set up and made ready to run. This could be due to various reasons, such as insufficient resources on the cluster, waiting for another operation to complete, or initial scheduling.
RunningThe pod has been bound to a node, and all containers have been created. At least one container is still running or is in the process of starting or restarting.
TerminatingA pod in the "Terminating" state is in the process of being shut down. Pods can remain in this state temporarily while they complete their shutdown procedures, such as finalizing log outputs or executing shutdown hooks.
SucceededAll containers in the pod have terminated successfully and will not be restarted.
FailedAll containers in the pod have been terminated, and at least one container has been terminated in a failure (exited with a non-zero exit status or was stopped by the system).
UnknownThe state of the pod could not be determined. This is often due to an error in communicating with the pod host.
CrashLoopBackOffThis is not a state but a common error you might see in your pod status. It indicates that one or more of the containers in the pod keep crashing, and Kubernetes is attempting to restart it, usually with back-off delays.
EvictedThe system terminated the pod due to a lack of resources or other conditions that made it unviable to run.
CompletedThis status is similar to the "Succeeded" state at the pod level. It indicates that a container within the pod has finished executing its task successfully and has exited with a status code of 0. A pod can be considered in a "Succeeded" state when all of its containers are "Completed." This status is typically seen in jobs or pods meant to run a task to completion.
OOMKilledThis status indicates that a container within the pod was killed because it ran out of memory (OOM stands for Out Of Memory). Kubernetes or the underlying container runtime will terminate a container when it exceeds its allocated memory limit. This is an important status to monitor because it signifies that your application might be using more memory than expected or allocated, leading to potential disruptions in service. Adjusting the memory limits for the pod or optimizing the application's memory usage are potential ways to address this issue.
ImagePullBackOffThis status indicates that Kubernetes has trouble pulling a container image from the registry. This could be due to various reasons, such as network issues, authentication problems with the container registry, or the image not existing.
ErrImagePullSimilar to "ImagePullBackOff", this status indicates an error in pulling the container image from the registry, but it typically means that the attempt to pull the image has failed immediately, without retrying. The cause could be a non-existent image or problems with credentials.
ContainerCreatingThis status is seen when the container within the pod is being created. If a pod remains in this state for an extended period, it may indicate issues with volume mounts, container image pulling, or other resource-related problems.
Init:CrashLoopBackOffThis specific form of "CrashLoopBackOff" applies to init containers, which are specialized containers that run before the application containers in a pod. If an init container fails, it can prevent the pod from reaching a running state, as init containers must complete successfully before application containers are started.
Init:ErrorSimilar to the "Init:CrashLoopBackOff" status, this indicates that an init container has exited with an error. This status shows a problem executing the init container, but unlike "CrashLoopBackOff," it doesn't imply repeated attempts and failures.