This lab is concentrating on Kubernetes Resource Management. We will explore Limit ranges for containers applied on Deployment-level as well as on Namespace-level. We also will discuss the possibility to limit the sum of resources on Namespace-level using ResourceQuotas.
The tests are performed on the Katacoda Kubernetes Playground again. Even though I had not expected it, the Katacoda platform offers enough resources and possibilities to limit those.
Phase 1: Container Resource Limits of Deployments
Step 1.1: Deploy a Stressful POD
On the master, we first create a YAML file with the –dry-run option shown in the previous blog post, before we apply the file and get an overview of the result:
kubectl create deployment stress --image vish/stress --dry-run -o yaml > stress.yaml kubectl apply -f stress.yaml # output: deployment.apps/stress created kubectl get deployments # output: NAME READY UP-TO-DATE AVAILABLE AGE stress 1/1 1 1 75s
Step 1.2: Set Resource Limits for the POD
Now we change the yaml file and replace the line
resources: {}
by
resources: limits: memory: "500Mi" requests: memory: "250Mi"
The yaml file looks as follows:
apiVersion: apps/v1
kind: Deployment
metadata:
creationTimestamp: null
labels:
app: stress
name: stress
spec:
replicas: 1
selector:
matchLabels:
app: stress
strategy: {}
template:
metadata:
creationTimestamp: null
labels:
app: stress
spec:
containers:
- image: vish/stress
name: stress
resources:
limits:
memory: "500Mi"
requests:
memory: "250Mi"
terminationMessagePolicy: FallbackToLogsOnError
status: {}
Note: Blanks must not be used for the memory.
I got the error message
Limits: unmarshalerDecoder: quantities must match the regular expression '^([+-]?[0-9.]+)([eEinumkKMGTP]*[-+]?[0-9]*)$', error found in #10 byte of ...
first, because I had tried to set the memory Limit to „500 Mi“ instead of „500Mi“ first. Removing the space has fixed the problem.
Other from what you could think, starting the stress container does not stress the system yet;
kubectl logs stress-6f8b598b78-8s94p # output: I0719 15:42:52.484422 1 main.go:26] Allocating "0" memory, in "4Ki" chunks, with a 1ms sleep between allocations I0719 15:42:52.484525 1 main.go:29] Allocated "0" memory
Step 1.3: Allocate POD Resources below Limit
For stressing the system we need to run the stress container with some additional parameters:
apiVersion: apps/v1
kind: Deployment
metadata:
creationTimestamp: null
labels:
app: stress
name: stress
spec:
replicas: 1
selector:
matchLabels:
app: stress
strategy: {}
template:
metadata:
creationTimestamp: null
labels:
app: stress
spec:
containers:
- image: vish/stress
name: stress
resources:
limits:
cpu: "1"
memory: "500Mi"
requests:
cpu: "0.5"
memory: "250Mi"
args:
- -cpus
- "2"
- -mem-total
- "400Mi"
- -mem-alloc-size
- "100Mi"
- -mem-alloc-sleep
- "1s"
terminationMessagePolicy: FallbackToLogsOnError
Now we can see on the worker node that the stress process is consuming almost 100% CPU and 400Mi:
node01 $ top top - 16:10:32 up 48 min, 1 user, load average: 0.97, 0.56, 0.27 Tasks: 134 total, 1 running, 133 sleeping, 0 stopped, 0 zombie %Cpu(s): 5.5 us, 20.3 sy, 0.0 ni, 74.1 id, 0.0 wa, 0.0 hi, 0.1 si, 0.0 st KiB Mem : 4045932 total, 2562724 free, 688828 used, 794380 buff/cache KiB Swap: 0 total, 0 free, 0 used. 3109212 avail Mem PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 6727 root 20 0 431604 426396 3184 S 100.0 10.5 2:41.12 stress 1512 root 20 0 852216 92716 60100 S 2.3 2.3 1:32.58 kubelet 1002 root 20 0 731160 92372 39316 S 1.0 2.3 0:49.77 dockerd 7 root 20 0 0 0 0 S 0.3 0.0 0:01.05 rcu_sched 6868 root 20 0 0 0 0 S 0.3 0.0 0:00.23 kworker/u8:3 1 root 20 0 38080 6124 4008 S 0.0 0.2 0:04.48 systemd ...
Step 1.4: Allocate POD Resources above Limit
Now let us see, what happens if stress tries to allocate more memory than the limit: we set the Memory consumption of the Stress container above the limit of 500Mi:
spec:
containers:
- image: vish/stress
name: stress
resources:
limits:
cpu: "1"
memory: "500Mi"
requests:
cpu: "0.5"
memory: "250Mi"
args:
- -cpus
- "2"
- -mem-total
- "600Mi"
- -mem-alloc-size
- "100Mi"
- -mem-alloc-sleep
- "1s"
terminationMessagePolicy: FallbackToLogsOnError
We re-apply the deployment on the master
kubectl replace -f stress.yaml
and watch, what happens on the worker node:
node01 $ watch "kubectl get pods"
We will see something like follows:
master $ k get pods NAME READY STATUS RESTARTS AGE stress-7bd7c8c65d-5xkhs 1/1 Running 1 15s master $ k get pods NAME READY STATUS RESTARTS AGE stress-7bd7c8c65d-5xkhs 0/1 OOMKilled 1 18s master $ k get pods NAME READY STATUS RESTARTS AGE stress-7bd7c8c65d-5xkhs 0/1 OOMKilled 1 19s master $ k get pods NAME READY STATUS RESTARTS AGE stress-7bd7c8c65d-5xkhs 0/1 OOMKilled 1 20s master $ k get pods NAME READY STATUS RESTARTS AGE stress-7bd7c8c65d-5xkhs 0/1 OOMKilled 1 21s master $ k get pods NAME READY STATUS RESTARTS AGE stress-7bd7c8c65d-5xkhs 0/1 OOMKilled 1 23s master $ k get pods NAME READY STATUS RESTARTS AGE stress-7bd7c8c65d-5xkhs 0/1 OOMKilled 1 24s master $ k get pods NAME READY STATUS RESTARTS AGE stress-7bd7c8c65d-5xkhs 0/1 OOMKilled 1 25s master $ k get pods NAME READY STATUS RESTARTS AGE stress-7bd7c8c65d-5xkhs 0/1 OOMKilled 1 27s master $ k get pods NAME READY STATUS RESTARTS AGE stress-7bd7c8c65d-5xkhs 0/1 CrashLoopBackOff 1 29s master $ k get pods NAME READY STATUS RESTARTS AGE stress-7bd7c8c65d-5xkhs 1/1 Running 2 30s
We can see, that the POD turns from Running status to OOMKilled status via CrashLoopBackOff status to Running status again.
So, we can see, that Kubernetes just kills any POD that exceeds the POD resource limits. This is not friendly, but effective.
Let us see what happens, if CPU is exceeded only.
We had specified a CPU limit of 1, but stress tried to allocate 2 CPUs. Why wasn’t it killed before we have increades the memory needs of the stress process? The reason is probably that there is only on CPU:
node01 $ cat /proc/cpuinfo processor : 0 vendor_id : GenuineIntel cpu family : 6 model : 79 model name : Intel(R) Xeon(R) CPU E5-2620 v4 @ 2.10GHz stepping : 1 microcode : 0x1 cpu MHz : 2099.996 cache size : 16384 KB physical id : 0 siblings : 1 core id : 0 cpu cores : 1 apicid : 0 initial apicid : 0 fpu : yes fpu_exception : yes cpuid level : 13 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss syscall nxpdpe1gb rdtscp lm constant_tsc arch_perfmon rep_good nopl xtopology tsc_known_freq pni pclmulqdq ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single ibrs ibpb kaiser fsgsbase tsc_adjust bmi1 hle avx2 smep bmi2 erms invpcid rtm rdseed adx smap xsaveopt arat bugs : cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf mds bogomips : 4199.99 clflush size : 64 cache_alignment : 64 address sizes : 40 bits physical, 48 bits virtual power management: processor : 1 vendor_id : GenuineIntel cpu family : 6 model : 79 model name : Intel(R) Xeon(R) CPU E5-2620 v4 @ 2.10GHz stepping : 1 microcode : 0x1 cpu MHz : 2099.996 cache size : 16384 KB physical id : 1 siblings : 1 core id : 0 cpu cores : 1 apicid : 1 initial apicid : 1 fpu : yes fpu_exception : yes cpuid level : 13 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss syscall nxpdpe1gb rdtscp lm constant_tsc arch_perfmon rep_good nopl xtopology tsc_known_freq pni pclmulqdq ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single ibrs ibpb kaiser fsgsbase tsc_adjust bmi1 hle avx2 smep bmi2 erms invpcid rtm rdseed adx smap xsaveopt arat bugs : cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf mds bogomips : 4199.99 clflush size : 64 cache_alignment : 64 address sizes : 40 bits physical, 48 bits virtual power management: processor : 2 vendor_id : GenuineIntel cpu family : 6 model : 79 model name : Intel(R) Xeon(R) CPU E5-2620 v4 @ 2.10GHz stepping : 1 microcode : 0x1 cpu MHz : 2099.996 cache size : 16384 KB physical id : 2 siblings : 1 core id : 0 cpu cores : 1 apicid : 2 initial apicid : 2 fpu : yes fpu_exception : yes cpuid level : 13 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss syscall nxpdpe1gb rdtscp lm constant_tsc arch_perfmon rep_good nopl xtopology tsc_known_freq pni pclmulqdq ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single ibrs ibpb kaiser fsgsbase tsc_adjust bmi1 hle avx2 smep bmi2 erms invpcid rtm rdseed adx smap xsaveopt arat bugs : cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf mds bogomips : 4199.99 clflush size : 64 cache_alignment : 64 address sizes : 40 bits physical, 48 bits virtual power management: processor : 3 vendor_id : GenuineIntel cpu family : 6 model : 79 model name : Intel(R) Xeon(R) CPU E5-2620 v4 @ 2.10GHz stepping : 1 microcode : 0x1 cpu MHz : 2099.996 cache size : 16384 KB physical id : 3 siblings : 1 core id : 0 cpu cores : 1 apicid : 3 initial apicid : 3 fpu : yes fpu_exception : yes cpuid level : 13 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss syscall nxpdpe1gb rdtscp lm constant_tsc arch_perfmon rep_good nopl xtopology tsc_known_freq pni pclmulqdq ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single ibrs ibpb kaiser fsgsbase tsc_adjust bmi1 hle avx2 smep bmi2 erms invpcid rtm rdseed adx smap xsaveopt arat bugs : cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf mds bogomips : 4199.99 clflush size : 64 cache_alignment : 64 address sizes : 40 bits physical, 48 bits virtual power management:
Step 1.5: Exceeding the CPU Limit
So, let us use less memory, but limit the CPU to 0.5, so the stress process has the possibility to exceed the CPU limit:
# stress.yaml apiVersion: apps/v1 kind: Deployment metadata: creationTimestamp: null labels: app: stress name: stress spec: replicas: 1 selector: matchLabels: app: stress strategy: {} template: metadata: creationTimestamp: null labels: app: stress spec: containers: - image: vish/stress name: stress resources: limits: cpu: "0.4" memory: "500Mi" requests: cpu: "0.1" memory: "250Mi" args: - -cpus - "1" - -mem-total - "100Mi" - -mem-alloc-size - "100Mi" - -mem-alloc-sleep - "1s" terminationMessagePolicy: FallbackToLogsOnError status: {}
I would have expected that the POD is killed again, but this time this did not happen:
k get pods NAME READY STATUS RESTARTS AGE stress-d5bf8ff87-pvvb4 1/1 Running 0 2m13s
The behavior is much better than that: an „exceeded“ CPU limit does not happen, since Kubernetes is intelligent enough to manipulate the Linux kernel’s scheduler, so the process never exceeds the limit:
node01 $ top top - 16:42:48 up 1:20, 1 user, load average: 0.54, 0.51, 0.39 Tasks: 132 total, 2 running, 130 sleeping, 0 stopped, 0 zombie %Cpu(s): 3.0 us, 8.1 sy, 0.0 ni, 88.9 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st KiB Mem : 4045932 total, 2875716 free, 370532 used, 799684 buff/cacheKiB Swap: 0 total, 0 free, 0 used. 3425560 avail Mem PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 10426 root 20 0 113448 109280 3120 R 39.9 2.7 1:12.46 stress 1512 root 20 0 852216 94204 60280 S 2.3 2.3 2:37.44 kubelet 1002 root 20 0 731672 90128 39316 S 1.0 2.2 1:24.73 dockerd ...
That is nice! Instead of killing the POD, it just is limited.
However, if a single POD cannot consume more than 40% of the CPU, can we just scale the application to circumvent the limitation? Let us scale the
Step 1.6: Increasing the number of PODs
k scale deployment stress --replicas=3
On the worker node, we see that each POD is limited to 40% CPU, but the whole deployment can consume many times of this CPU by scaling horizontally:
node01 $ top top - 17:08:05 up 55 min, 1 user, load average: 2.68, 2.60, 1.27 Tasks: 139 total, 1 running, 138 sleeping, 0 stopped, 0 zombie %Cpu(s): 7.1 us, 23.7 sy, 0.0 ni, 69.1 id, 0.0 wa, 0.0 hi, 0.0 si, 0.1 st KiB Mem : 4045932 total, 2681328 free, 592728 used, 771876 buff/cache KiB Swap: 0 total, 0 free, 0 used. 3197920 avail Mem PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 6617 root 20 0 113192 109544 3120 S 39.9 2.7 2:29.96 stress 7104 root 20 0 113192 109664 3248 S 39.9 2.7 2:07.84 stress 6884 root 20 0 113448 109284 3120 S 39.5 2.7 2:15.97 stress 1548 root 20 0 1081608 95244 61412 S 2.7 2.4 1:42.81 kubelet 994 root 20 0 1033228 94284 39756 S 1.0 2.3 0:56.09 dockerd 9483 root 20 0 0 0 0 S 0.3 0.0 0:00.17 kworker/u8
Okay, the total amount of CPU is not limited for the deployment. Can we just move the resource limit to the spec of the deployment? Let us try: let us use less memory, but limit the CPU to 0.5, so the stress process has the possibility to exceed the CPU limit:
# stress-deployment-limit.yaml -- failed apiVersion: apps/v1 kind: Deployment metadata: creationTimestamp: null labels: app: stress name: stress spec: replicas: 1 selector: matchLabels: app: stress strategy: {} resources: limits: cpu: "0.4" memory: "500Mi" requests: cpu: "0.1" memory: "250Mi" template: metadata: creationTimestamp: null labels: app: stress spec: containers: - image: vish/stress name: stress resources: {} args: - -cpus - "1" - -mem-total - "100Mi" - -mem-alloc-size - "100Mi" - -mem-alloc-sleep - "1s" terminationMessagePolicy: FallbackToLogsOnError status: {}
No, we cannot use resource limits on the Deployment level:
k replace -f stress-deployment-limit.yaml error: error validating "stress-deployment-limit.yaml": error validating data: ValidationError(Deployment.spec): unknown field "resources" in io.k8s.api.apps.v1.DeploymentSpec; if you choose to ignore these errors, turn validation off with --validate=false
Phase 2: Namespace Level Policies
Step 2.1: Limit Ranges
Above, we have set limit ranges per container on the Deployment level by adding resources to the container spec within the deployment. The limit range was applied to all containers within the Deployment. Can we do something similar for Namespaces?
Yes, we can. We need to create an object of type LinitRange and create it for a certain namespace. Let us create the LimitRange YAML file:
# limitrange.yaml apiVersion: v1 kind: LimitRange metadata: name: low-resource-range spec: limits: - type: Container default: cpu: 0.2 defaultRequest: cpu: 0.1
Now let us create a namespace and apply the limit range to it. For that, you just need to create a limit range with the corresponding namespace:
k create namespace low-resource-range # output: namespace/low-resource-range created k create -f limitrange.yaml --namespace low-resource-range # output: limitrange/low-resource-range created
Now we create a POD and verify that the POD has inherited the limit range from the namespace:
k create deployment nginx --image=nginx --namespace low-resource-range
We can see that the pod is up and running:
k get pods --namespace low-resource-range NAME READY STATUS RESTARTS AGE nginx-65f88748fd-g88wg 1/1 Running 0 4m33s
Moreover, we can see that the POD has taken over the limit range:
kubectl describe pod nginx-65f88748fd-g88wg -n low-resource-range ... Limits: cpu: 200m Requests: cpu: 100m ...
Interestingly, nothing of that sort can be seen on the deployment level: kubectl describe deployment nginx --namespace low-resource-range
does not show any hint of the limit range. This is something between the namespace and the POD. That may be better that way since a changed LimitRange will be applied any time a new POD comes up.
Step 2.2: Resource Quotas
We can apply Resource Quotas to a namespace as well. However, what is the difference between a LimitRange and ResourceQuota?
Before we have tried to apply limits to the sum of resources to a Deployment. This is not supported. However, we can limit the overall resources within a namespace through ResourceQuotas.
Since this was not part of the LSF458 lab, but when looking at the official documentation, the handling of ResourceQuotas is very similar to the one of LimitRanges. I have copied the example here:
kubectl create namespace myspace
cat <<EOF > compute-resources.yaml
apiVersion: v1
kind: ResourceQuota
metadata:
name: compute-resources
spec:
hard:
pods: "4"
requests.cpu: "1"
requests.memory: 1Gi
limits.cpu: "2"
limits.memory: 2Gi
requests.nvidia.com/gpu: 4
EOF
kubectl create -f ./compute-resources.yaml --namespace=myspace
I have not tested it yet, but now the user should receive a 403 forbidden message if he tries to create a resource that exceeds the resource limits.
CertsTopics offers Salesforce Salesforce Data Cloud Accredited Professional Exam real exam questions and practice test engine with real questions and verified answers. Try Salesforce Data Cloud Accredited Professional Exam exam questions for free. You can also download a free PDF demo of Salesforce Salesforce Data Cloud Accredited Professional Exam exam.
Salesforce-Data-Cloud exact questions answers
Your article helped me a lot, is there any more related content? Thanks!