Published on
Exercise 2

Pod crash

Authors
  • Name
    Twitter

The OpenShift Social Club has decided to organise a chess tournament, and want to host the chess game on OpenShift. They almost manage to get the pod to run, but there is an issue. As the OpenShift expert, you have been voluntold to investigate the issue.

Note that even if you manage to solve this without any hint, you will want to explore Hint 3.

Let's get started!

Table of Contents

What's going on?

Navigate to your project (exercise02-userXX), make sure you are in the developer perspective, and pick the "Topology view" from the left panel. The deployment is in red (sometime orange for a few seconds), so something is not right, we want this nice dark blue deployment color

topology

You may try and identify the issue. Poke around, click, explore. If no luck after a few minutes, then check the hints below as needed.

Hints

Hint 1

How do I see a deployment overview?

You can get a quick overview of the deployment situation by clicking on the deployment icon. You can probably see some early indications of the problem.

hint1

In the following hints, we are going to explore the three options (events, logs and debugs).

Hint 2

Checking the events...

From Hint1, if you click on View events, you should see:

hint2

You probably noticed there is not much additional information, just a confirmation of the crashloop.

Hint 3

Exploring container debugs...

From Hint1, if you click on Debug container chess: OpenShift starts a new chess container, but overwrite the entrypoint command to give you a shell instead:

hint3

This looks like a node app, if you are a bit familiar, you probably know you can usually start the app with npm run. You can try and run the npm command, this should give you some logs that indicate everything is running, as expected. So why the application doesn't start in the normal (non-debug) chess pod?

Make sure to close the debug pod, we don't really want them to keep running in the background once the investigation is done.

Hint 4

Exploring container logs...

From Hint1, if you click on View Logs: You have direct access to the container log. What do you see?

hint4

The entrypoint used by a container is typically "baked" inside the container image. The next step would be to fix the container image itself. However, we won't do that for this exercise. Instead, we are going to overwrite this entrypoint in the deployment definition, a bit like what's done automatically at the pod definition level, when running the container in debugger mode (see Hint 3).

You will want to modify the Yaml file of the Deployment (not the Pod) to overwrite this entrypoint. Below is some useful documentation, first to make sure you understand the subtle difference between command (CMD) and entrypoint (ENTRYPOINT) in a containerfile, and how they work together:

But more importantly, to update the Deployment definition, you can lean on the build-in schema viewer:

hint4
hint4

You can also get inspired by this page, ignoring the fact that the example is on a DeploymentConfig (very similar to Deployment, yet deprecated). https://docs.openshift.com/container-platform/4.15/applications/deployments/managing-deployment-processes.html#deployments-exe-cmd-in-container_deployment-operations

Solution

Ok, if you looked at the hints, you probably understood that:

  • the "baked" entrypoint of that container has a typo
  • for this exercise, you should update the Deployment definition (instead of trying to fix the container image) to overwrite this entrypoint

Here is how it looks:

solution

Your container should now start, and you should be able to start a chess game by clicking on the route. One way to see your routes is via the "Administrator" perspective, in the Network section. The other way is via the developer perspective

solution

You should be able to play a few games, and maybe qualify for the next Candidates tournament.

solution

Did you know?

Administrator perspective

When you switch to the admin perspective, you won't have the nice "topology view", but the information is presented to you with more details, often better suited when doing admin tasks.

duk1

This is particularly useful when you have an admin role (your userxx does not have admin role), as this give you a view of your workload across all projects. For illustration below zeadmin has admin role, and can list pods across projects.

duk3

Logs in multi container pods

While most pods have just one container, you may have pods with multiple containers, or even init-containers. You will be able to access the logs of each of those via a dropdown menu. Here is an example on how it looks like:

duk2

When a pod is in crashloop, you can access the "debug a container" link from the "Logs" section of the pod:

duk4

This link is not displayed if the pod is running normally (as of OpenShift 4.15.8). However, if you are fast enough, you may catch the link while it is still visible when a new pod starts. See the animation below. Are you fast enough to reproduce?

duk4

Take away

When a pod crashloop, the common approach to understand what's going on are:

  • Look at the logs of the failed container(s)
  • Start the failing container in debugger mode to inspect it's content, and try some commands

Stretch goal (networking)

Have a look at Hint 3, the one about the debug pod. While we started the npm app with the right command, one step further would be to create a service, a route, and open the route from a web browser, all that to check that the chess game is correctly running in this debug pod. Here is some helpful info to achieve that:

  • the debug pod doesn't have any label. You will need to add one (edit the yaml file), i.e. app: debug-chess that will be used by the service
  • when creating a service via the web console, the default yaml will not have the correct app selector (use the label you created in the debug pod!) so you will need to adjust that. Additionally, you will need to adjust use the network port. Make sure to use port: 3000 and targetPort: 3000 for this node app
  • the route creation via the web console should be intuitive.