Ordered upgrade of kernel module without reboot
It is sometimes desirable to upgrade the kernel module on a node without rebooting it.
To minimize the impact on the workload running in the cluster, the upgrade process should be executed sequentially, one
node at a time.
Because it requires knowledge of workload utilizing the kernel module, this process is managed by the cluster
administrator.
Prerequisite
In order to successfully upgrade a kernel module to a new version, the currently loaded kernel module should be
unloaded, and a new kernel module file should be loaded.
This means that all user-mode application/workloads using the kernel module should be terminated before attempting to
unload the kernel module.
There are 2 type of workloads that can use a kernel module:
- user workload: application running in the cluster that is accessing kernel module. It is deployed by the administrator, and it is their responsibility to make sure that the workload is not running on the node prior to kernel module unloading, and to make sure that the workload is back running on the node once the new kernel module version has been loaded.
- device plugin managed by KMM. The operator will manage the unloading/loading of the device plugin on a node being upgraded.
Due to Kubernetes limitations in label names, the combined length of Module
name and namespace may not exceed 39
characters.
Upgrade steps
- To run an ordered upgrade, set the
kmm.node.kubernetes.io/version-module.<module-namespace>.<module-name>=$moduleVersion
label on all the nodes that will be used by the kernel module.$moduleVersion
value should be equal to the value of theversion
field in theModule
. -
The
Module
can now be upgraded. Update the following fields:containerImage
(to appropriate kernel version)version
The update should be atomic: bothcontainerImage
andversion
fields must be updated simultaneously.
-
Terminate any workload using the kmod on the node being upgraded.
- Remove the
kmm.node.kubernetes.io/version-module.<module-namespace>.<module-name>
label on the node. This will unload the kernel module from the node. - If required, perform any additional maintenance required on the node for the kmod upgrade.
- Add the
kmm.node.kubernetes.io/version-module.<module-namespace>.<module-name>=$moduleVersion
label to the node.$moduleVersion
should be equal to the new value of theversion
field in theModule
(after step 2). - Restore any workload that would leverage the kernel module on the node.
In case no additional upgrading action (step 4) is needed, then the process boils down to changing the
kmm.node.kubernetes.io/version-module.<module-namespace>.<module-name>=$moduleVersion
label.
Steps 3, 4 and 5 are then unified into one step: update the
kmm.node.kubernetes.io/version-module.<module-namespace>.<module-name>
label value to new $moduleVersion
as set in
the Module
.
Implementation details
Components
- A dedicated controller in the operator watches
kmm.node.kubernetes.io/version-module.<module-namespace>.<module-name>
labels on nodes. When it detects a change in the value of that label, it sets the following internal labels to initiate the upgrade process:beta.kmm.node.kubernetes.io/version-worker-pod.<module-namespace>.<module-name>
beta.kmm.node.kubernetes.io/version-device-plugin.<module-namespace>.<module-name>
- The operator manipulates its internal
NodeModulesConfig
resources to create worker Pods that unload the current version of a module and load the new one. - The operator creates new device plugin DaemonSets with a new node selector component:
kmm.node.kubernetes.io/version-device-plugin.<module-namespace>.<module-name>
This makes it possible to terminate the device plugin Pod before unloading the current module, and to create them again when the new module is loaded.
Flow
- When the
version
field is modified in theModule
, a new device plugin DaemonSet is created with the following internal node selector:beta.kmm.node.kubernetes.io/version-device-plugin.<module-namespace>.<module-name>=$moduleVersion
- When the
kmm.node.kubernetes.io/version-module.<module-namespace>.<module-name>
label is changed on the node, KMM's version label controller removes thebeta.kmm.node.kubernetes.io/version-device-plugin.<module-namespace>.<module-name>
label, which terminates the device plugin Pod; - Once the device plugin Pod is terminated, the controller removes the
beta.kmm.node.kubernetes.io/version-worker-pod.<module-namespace>.<module-name>
label, which removes the entry from the correspondingNodeModulesConfig
resource. This results in a worker Pod being created to unload the current kernel module. - Once the current kernel module is unloaded, the controller sets the
beta.kmm.node.kubernetes.io/version-worker-pod.<module-namespace>.<module-name>=$moduleVersion
label on the node, which adds back an updated entry into the correspondingNodeModulesConfig
resource. This results in a worker Pod being created to load the new kernel module. - Once the new kernel module is loaded, the controller sets the
beta.kmm.node.kubernetes.io/version-device-plugin.<module-namespace>.<module-name>=$moduleVersion
label on the node, which will cause the device plugin Pod to start again.