NVIDIA’s solution
In the GPU-Operator solution, there is a VFIO-Manager component that supports unbinding the GPU device from either the GPU driver
or the VFIO-PCI driver
, and binding it to the VFIO-PCI driver
. The VFIO-Manager is controlled by the vfio-manage.sh
script.
vfio-manage.sh Functionality Overview
Main Function:
help
: display the help information
bind
: bind the vfio-pci driver
unbind
: unbind the vfio-pci driver
options:
--all
: bind all devices
--device_id
: bind a specified devices
Function: bind
If a specific GPU is specified, bind only that GPU; otherwise, bind all devices
bind all device Find all devices and bind each target GPU sequentially
bind target gpu
Check if the device is not an NVIDIA GPU, return with an error
Execute
bind_pci_device
:- Check if VFIO-PCI driver is already bound; if true, return
- Execute unbind_from_other_device
# 1.Check if VFIO-PCI driver is already bound; if true, return [ -e "/sys/bus/pci/devices/$gpu/driver" ] || return 0 #2. get current driver existing_driver=$(readlink -f "/sys/bus/pci/devices/$gpu/driver") existing_driver_name=$(basename "$existing_driver") #3. if current driver is vfio-pci, return [ "$existing_driver_name" != "vfio-pci" ] || return 0 #4.unbind echo "$gpu" > "$existing_driver/unbind" echo > /sys/bus/pci/devices/$gpu/driver_override
- Execute two bind operations:
echo "vfio-pci" > /sys/bus/pci/devices/$gpu/driver_override echo "$gpu" > /sys/bus/pci/drivers/vfio-pci/bind
If the device is a graphic GPU, also bind the auxiliary device
Function: unbind
If a specific GPU is specified, unbind only that GPU; otherwise, unbind all devices
unbind all NVIDIA GPU
Find all devices under /sys/bus/pci/device
, check if the device manufacturer equals 0x10de
. If true, get the vendor ID and sequentially unbind each GPU.
[!tip] vendor number
NVIDIA device vendorID is 0x10de
unbind target GPU
- Check if it’s an NVIDIA GPU; if not, return
- Execute
unbind_from_driver
:- Check If device is already bound; if not, return
- Get current GPU bound driver path
- Execute two unbind operations:
echo "$gpu" > "$existing_driver/unbind"
echo > /sys/bus/pci/devices/$gpu/driver_override
- If the device is a graphic GPU, also unbind the auxiliary device