Published on
Exercise 3

ReplicaSet issue

Authors
  • Name
    Twitter

One of your colleague has containerised the famous 2048 game/puzzle, and want's to run in on the OpenShift cluster, for his lunch break time of course. However this container won't schedule.

You owe your colleague a few favours, so you agree to help him fix the scheduling issue.

Let's get started!

Table of Contents

What's going on?

Navigate to your project (exercise03-userXX), make sure you are in the developer perspective, and pick the "Topology view" from the left panel.

topology

You may try and identify the issue. Poke around, click, explore. If no luck after a few minutes, then check the hints below as needed.

Oh, I almost forgot. This 2048 container is running fine in the project exercise03-readonly. While comparing this project with yours will not help you, you might find it useful, at some point, to look at this project, and maybe observe a bit the resource consumptions of this pod. I think already said too much.

Good luck!

Hints

Hint 1

Check the topology view...

Have a look at the deployment information

hint1

From the notification in yellow, it looks like:

  • there is some quota setup on this project
  • OCP ask you to setup the CPU requests and limits for this deployment.

Note that this message is a notification. It will not disappear until you click on the cross. It will remain even if it's not relevant. The real source of the message is visible in the Event section of the ReplicaSets. You will want to check regularly there as you try and fix things.

Not sure what limits and requests are? Probably worth reading the first two paragraph of this documentation: https://kubernetes.io/docs/concepts/configuration/manage-resources-containers/

Since some quota have been applied to your namespace, OpenShift is requesting that you specify limits and/or requests for the associated resource. In this case, no CPU limits / requests were made for the 2048 deployment, and you will have to add some. While the yellow notification gives you a direct link, remember that you can access the resources limit from the Action drop down menu, see below.

hint1-2

Alternatively, you could edit the Yaml file of your deployment, but the Action -> resources limit form is much simpler. You can try and set those limits directly, however, it would be a good idea to get the exact quota in place for this project. You won't be able to edit it (you don't have the right permission), but you should be able to see it. Have a go!

Hint 2

How to check the quota on your project?

You probably get the idea that some resource quota have been applied to this project. You can easily view what you can work with for your 2048 app! Pick tha admin perspective, head over the Administration/Resource quotas, click on your quota, and scroll down.

hint2
hint2-2

Hint 3

How to check what's blocking my ReplicaSet to start the pod?

You probably already tried to adjust the requests / limits for your 2048 deployment. It may or may not work. If it does not work, one place to check for useful messages is in the Events section of your ReplicaSets, for example:

hint3
hint3

Hint 4

What happens if my resource requests / limits are too low?

If there is not enough CPU resource for your app, it might run slowly. On the other hand, if there is not enough memory for your application, you POD might hit some Out of Memory issue. You may encounter that. If so, head over the Events tab of your pod:

hint4

Hint 5

How to check resource usage?

Depending of the type of application deployed, you may or may not know the minimum amount of resources (in particular, memory) required to run. Sometime, you will need to "observe" the resource consumptions. You can head over the exercise03-readonly project that is running the 2048 app, and check it's resource usage. This should give you an idea for your own limits/requests memory.

hint5

Solution

You need to setup up the CPU request / limit and Memory request / limit lower than your project Quota. For example, the value in the screenshot below should work

solution1

After this change, you will have a 2048 pod running, but the deployment is still in Yellow. That's because this deployment set the replicas count to two.

solution2

De we really need high availability for the 2048 game? Not really, so we can decrease the replica count to 1. At this stage, the deployment will be dark blue.

solution3

Now, while this is working fine, one last question for you. Considering that:

  • The Quota limit is 20MiB, which is enough for this game to run
  • The Quota request is 15MiB which is not enough for this game to run
  • We set the deployment request and limit equal to the quotas

Do you think this pod is at risk of being killed if the memory of this node is exhausted?

The request really set the guaranteed amount of Memory for this container. But this amount is not enough to run this game, as it needs about 20 MiB. While OpenShift allow this container to use up to the Limit of 20MiB if resource permit, if this node suddenly starve Memory, this container will hit an OOM issue, and will be killed. It may be rescheduled on another node, or may not be reschedulable, but that's another story.

Needless to say, the next step would be to ask the Administrator to bump the Memory Request of this project to 20MiB, to be safe from issues.

In the meanwhile, let's see if you can hit the 2048 high score of this workshop :)

Did you know?

Resource Quota from Developer Perspective

While we showed you how to access the resource quota page from the admin perspective, you can also access it from the Developer perspective.

duk1

Memory as seen by the pod

Containers do not virtualize memory or CPU. This means that from within a container, an application will see that the total amount of memory available (in the container) is the total amount of memory of the node. Even if limits / requests are being used.

You won't be able to start a terminal from your 2048 pod, due to very tight memory constraint on this project. But for you, I took a screenshot comparing the memory as seen by the 2048 pod, and the node. The values are equal.

duk2

This means that some applications that like to auto-calculate the memory it will use (i.e. a Java app auto-setting it's own heap size) will calculate against the total amount of memory, which may create problem. This is why in a containerised world, it is important to carefully control the memory the application will use (i.e. by setting the heap size).

Take away

  • ResourceQuotas are usually set on Project, in which case pods/containers must specify the Limit and Request for each of those CPU/Mem resource.
  • It may be useful to Observe the resource profile of an application over time, to understand how much resource should be allocated
  • While CPU overcommitment may not result in application crash, as CPU access can be throttle, it's a different story for Memory, and failing to carefully manage Memory allocation may result in OOM
  • ReplicaSet resources, which are managed by the Deployment resource, will provide valuable insights, when checking the ReplicaSet Events.