Introduction

Ensuring the best application performance is crucial for businesses relying on high-performance computing environments. Although the entire world is talking about GPU, we also have CPU workloads at customers. Applications that are CPU-intensive often face challenges such as noisy neighbor interference, which can degrade performance significantly. This post explores best practices and technical recommendations for tuning performance in environments like VMware vSphere and Red Hat OpenShift Container Platform. Whether it’s bare-metal or virtual workers, these steps will help you improve operations and maximize performance.
Let’s start Optimizing OpenShift Application Performance: A Complete Guide

Procedure

The following procedure addresses each of the computing layers required to be optimized in order to achieve our goal. The layers are Bare Metal, Hypervisor, Container Infrastructure, Pod Infrastructure.

The recommendations are separated to the following sections:

  • Bare Metal Configuration
  • VMWare vSphere Hypervisor Configuration
  • OpenShift Infrastructure Recommendataions
    • General Recommendations
    • Virtual and Physical Workers Configurations
    • Pod Configurations

Bare Metal Configuration

Noisy neighbor issues can happen also in socket-wise, with processes fighting over L2 Cache. Intel based servers can use RDT technology to optimize performance. See explanation of Resource Director Technology at Intel’s website.

In our deployment we’ve had HP servers with Intel CPUs. Here are the recommended configuration we’ve done at the Bare metal level using iLO console:

Processor Options

We recommend disabling hyper threading, this will provide the entire Core to our tasks without splitting it to threads and

Memory Options

Note: This is a typical page but emphasize that the Virtual NUMA is disabled.

Virtualization Options

Of course we need to enable Intel VT and VT-d virtualization options.

NOTE: It doesn’t hurt to make sure SR-IOV is enabled for future network performance optimizations. So that we won’t need to enter this page again.

Advanced Performance Tuning Options

The following configurations make sure to avoid any power saving features as we want to max out the performance and not have something that manages our electricity variably.

Power and Performance Options

Here, with even a more aggressive approach, disable all lower power. options to the CPU.

Page 1/2

Page 2/2

This concludes the Bare Metal iLO configuration options preferred for optimal CPU performance.

VMWare vSphere Hypervisor

General Suggestions:

Since Workers are running on VMWare, there are several aspects that must be addressed on the hypervisor, i.e. VMWare

  1. Latency Sensitive VM – Ensure Latency Sensitivity is set to high [Reference Link]
  2. Consider using vHT [Reference Link] – This is to mirror the thread-twin of the vcpu to the Worker VM (Relevant for ESXi 8 and higher)
  3. Advanced NUMA Attributes – Consider setting to 0 [Reference Link] – changed to 0 , change in advanced system settings for each of the hosts of Netapp cluster
Numa.RebalanceEnable 0
  1. Backing Guest vRAM with 1GB Pages (and setting hugepages in the CoreOS VM) – [Reference Link] -done on all sky cluster nodes
sched.mem.lpage.enable1GPage = "TRUE"
  1. Assign CPU Affinity for the VM [Reference Link , Reference Link]

Optimizing OpenShift Infrastructure

General Configuration Recommendations

Following Tuning nodes for low latency with the performance profile document, here are several suggestions:

  1. Configure an MCP call worker-cnf for node selection of low-latency workload.
  2. Recommended to add 3 Infra nodes and move infrastructure related workloads to them. Add taints and toleration for Infra nodes in order to allow running infrastructure related workloads only on the dedicated nodes.

https://docs.openshift.com/container-platform/4.14/machine_management/creating-infrastructure-machinesets.html

https://access.redhat.com/solutions/5034771

  1. To add 3 more worker nodes for application workloads only.

Virtual and Physical Worker Configuration Recommendations

  1. Consider setting PerformanceProfile with the following parameters:
apiVersion: performance.openshift.io/v2
kind: PerformanceProfile
metadata:
  name: manual
spec:
  1. Configure CPU isolation and reservation: [8]

NOTE: this will change depending on your Compute worker Virtual/Physical capabilities

  cpu:
    isolated: 0-19 # for 24 vcpus.
    reserved: 20-23
  1. Configure real-time kernel:
  realTimeKernel:
    enabled: true
  1. Configure workloadHints:
  workloadHints:
    highPowerConsumption: true
    realTime: true
  1. All should be selected for the worker-cnf machine pool.
  nodeSelector:
    node-role.kubernetes.io/worker-cnf: ""

NOTE: When you configure reserved and isolated CPUs, the infra containers in pods use the reserved CPUs and the application containers use the isolated CPUs.

  1. Consider disabling THP [Reference Link] [6]
apiVersion: tuned.openshift.io/v1
kind: Tuned
metadata:
  name: thp-workers-profile
  namespace: openshift-cluster-node-tuning-operator
spec:
  profile:
  - data: |
      [main]
      summary=Custom tuned profile for OpenShift to turn off THP on worker nodes
      include=openshift-node

      [vm]
      transparent_hugepages=never
    name: openshift-thp-never-worker

  recommend:
  - match:
    - label: node-role.kubernetes.io/worker-cnf
    priority: 25
    profile: openshift-thp-never-worker

Application Configuration Recommendations

NOTE: Configure your application pods with the following examples:

  1. Disable Power Saving Mode [Reference Link]
apiVersion: v1
kind: Pod
metadata:
  #...
  annotations:
    #...
    cpu-c-states.crio.io: "disable"
    # cpu-freq-governor.crio.io: "performance"
  #...
spec:
  #...
  runtimeClassName: performance-<profile_name>
  1. Disabling interrupt processing for CPUs where pinned containers are running [Reference Link]
apiVersion: performance.openshift.io/v2
kind: Pod
metadata:
  annotations:
      irq-load-balancing.crio.io: "disable"
spec:
    runtimeClassName: performance-<profile_name>
  1. Disable CPU CFS Quota:

Eliminates CPU throttling for pinned pods [Reference Link]

apiVersion: v1
kind: Pod
metadata:
  annotations:
      cpu-quota.crio.io: "disable"
spec:
    runtimeClassName: performance-<profile_name>
#...
  1. Isolated cores might be impacted by interrupts. The following annotations must be attached to the pod if guaranteed QoS pods require full use of the CPU:
apiVersion: v1
kind: Pod
metadata:
  annotations:
      cpu-load-balancing.crio.io: "disable"
spec:
    runtimeClassName: performance-<profile_name>
#...

Summary

Optimizing OpenShift Application Performance: A Complete Guide is a vital guide for ensuring smooth operations, especially for CPU-intensive workloads. This guide provided detailed recommendations for bare-metal and virtualized OpenShift setups, advanced tuning options, and power management. We also delved into optimizing OpenShift infrastructure and provide tips for virtual and physical worker configurations, as well as pod-level tuning. Whether you’re running on VMware vSphere or Red Hat OpenShift, these best practices help you tackle noisy neighbor interference and maximize application performance across diverse environments.