gVisor adds a layer of security to your AI/ML applications or other CUDA workloads while adding negligible overhead. By running these applications in a sandboxed environment, you can isolate your host system from potential vulnerabilities in AI code. This is crucial for handling sensitive data or deploying untrusted AI workloads.
gVisor supports running most CUDA applications on preselected versions of
NVIDIA’s open source driver.
To achieve this, gVisor implements a proxy driver inside the sandbox, henceforth
referred to as nvproxy
. nvproxy
proxies the application’s interactions with
NVIDIA’s driver on the host. It provides access to NVIDIA GPU-specific devices
to the sandboxed application. The CUDA application can run unmodified inside the
sandbox and interact transparently with these devices.
The runsc
flag --nvproxy
must be specified to enable GPU support. gVisor
supports GPUs in the following environments.
The
nvidia-container-runtime
is packaged as part of the
NVIDIA GPU Container Stack.
This runtime is just a shim and delegates all commands to the configured low
level runtime (which defaults to runc
). To use gVisor, specify runsc
as the
low level runtime in /etc/nvidia-container-runtime/config.toml
via the runtimes
option
and then run CUDA containers with nvidia-container-runtime
. The runtimes
option allows to specify an executable path or executable name that is
searchable in $PATH
. To specify runsc
with specific flags, the following
executable can be used:
exec /path/to/runsc --nvproxy <other runsc flags> "$@"
NOTE: gVisor currently only supports legacy mode. The alternative, csv mode, is not yet supported.
The “legacy” mode of nvidia-container-runtime
is directly compatible with the
--gpus
flag implemented by the docker CLI. So with Docker, runsc
can be used
directly (without having to go through nvidia-container-runtime
).
$ docker run --runtime=runsc --gpus=all --rm -it nvcr.io/nvidia/k8s/cuda-sample:vectoradd-cuda11.7.1-ubi8
[Vector addition of 50000 elements]
Copy input data from the host memory to the CUDA device
CUDA kernel launch with 196 blocks of 256 threads
Copy output data from the CUDA device to the host memory
Test PASSED
Done
GKE uses a different GPU container
stack than NVIDIA’s. GKE has
its own device plugin
(which is different from
k8s-device-plugin
). GKE’s
plugin modifies the container spec in a different way than the above-mentioned
methods.
NOTE: nvproxy
does not have integration support for k8s-device-plugin
yet.
So k8s environments other than GKE might not be supported.
gVisor supports a wide range of CUDA workloads, including PyTorch and various
generative models like LLMs. Check out
this blog post about running Stable Diffusion with gVisor.
gVisor undergoes continuous tests to ensure this functionality remains robust.
Real-world usage
of gVisor across different CUDA workloads helps discover and address potential
compatibility or performance issues in nvproxy
.
nvproxy
is a passthrough driver that forwards ioctl(2)
calls made to NVIDIA
devices by the containerized application directly to the host NVIDIA driver.
This forwarding is straightforward: ioctl
parameters are copied from the
application’s address space to the sentry’s address space, and then a host
ioctl
syscall is made. ioctl
s are passed through with minimal intervention;
nvproxy
does not emulate NVIDIA kernel-mode driver (KMD) logic. This design
translates to minimal overhead for GPU operations, ensuring that GPU bound
workloads experience negligible performance impact.
However, the presence of pointers and file descriptors within some ioctl
structs forces nvproxy
to perform appropriate translations. This requires
nvproxy
to be aware of the KMD’s ABI, specifically the layout of ioctl
structs. The challenge is compounded by the lack of ABI stability guarantees in
NVIDIA’s KMD, meaning ioctl
definitions can change arbitrarily between
releases. While the NVIDIA installer ensures matching KMD and user-mode driver
(UMD) component versions, a single gVisor version might be used with multiple
NVIDIA drivers. As a result, nvproxy
must understand the ABI for each
supported driver version, necessitating internal versioning logic for ioctl
s.
As a result, nvproxy
has the following limitations:
ioctl
s on each device file.gVisor currently supports NVIDIA GPUs:
While not officially supported, other NVIDIA GPUs based on the same microarchitectures as the above will likely work as well. This includes consumer-oriented GPUs such as RTX 3090 (Ampere) and RTX 4090 (Ada Lovelace).
Therefore, if you encounter an incompatible workload on a GPU on one of the above microarchitectures, even if on an unsupported GPU, chances are that this workload is also incompatible in the same manner on one of the officially supported GPUs. Please open a GitHub issue with reproduction instructions so that it can be tested against an officially supported GPU.
The range of driver versions supported by nvproxy
directly aligns with those
available within GKE. As GKE incorporates newer drivers, nvproxy
will extend
support accordingly. Conversely, to manage versioning complexity, nvproxy
will
drop support for drivers removed from GKE. This strategy ensures a streamlined
process and avoids unbounded growth in nvproxy
’s versioning.
To see what drivers a given runsc
version supports, run:
$ runsc nvproxy list-supported-drivers
NOTE: runsc
’s driver version is a strict version match because runsc
cannot assume ABI compatibility between driver versions. You may force runsc
to use a given supported ABI version with the --nvproxy-driver-version
even
when running on a host that has an unsupported driver version. However, doing so
is not officially supported, and running old drivers is generally not secure
as many driver updates address security bugs. Bug reports with the
--nvproxy-driver-version
flag set will be treated as invalid.
gVisor only exposes /dev/nvidiactl
, /dev/nvidia-uvm
and /dev/nvidia#
.
Some unsupported NVIDIA device files are:
/dev/nvidia-caps/*
: Controls nvidia-capabilities
, which is mainly used
by Multi-instance GPUs (MIGs)./dev/nvidia-drm
: Plugs into Linux’s Direct Rendering Manager (DRM)
subsystem./dev/nvidia-modeset
: Enables DRIVER_MODESET
capability in nvidia-drm
devices.ioctl
SetTo minimize maintenance overhead across supported driver versions, the set of
supported NVIDIA device ioctl
s is intentionally limited. This set was
generated by running a large number of CUDA workloads in gVisor. As nvproxy
is
adapted to more use cases, this set will continue to evolve.
Currently, nvproxy
focuses on supporting compute workloads (like CUDA).
Graphics and video capabilities are not yet supported due to missing ioctl
s.
If your GPU compute workload fails with gVisor, please note that some ioctl
commands might still be unimplemented. Please
open a GitHub issue
to describe about your use case. If a missing ioctl
implementation is the
problem, then the debug logs will contain
warnings with prefix nvproxy: unknown *
. See below on how to run the
ioctl_sniffer
tool.
There are a few methods to try when debugging GPU workloads. The first step to
try should be gVisor’s
ioctl_sniffer
tool; if your GPU workload fails due to unimplemented ioctl
commands in
gVisor, this tool will provide a list of the specific ones.
Occasionally, you may also need to dig into the Nvidia GPU Driver itself. To do so, you can install the OSS Driver repo and checkout the appropriate driver version.
DRIVER_VERSION=550.54.15
git clone https://github.com/NVIDIA/open-gpu-kernel-modules.git
cd open-gpu-kernel-modules
git checkout tags/$DRIVER_VERSION
For printk()
debugging, it is advised to use portDbgPrintf()
. See more
discussion
here. You
should be able to see the prints via dmesg(1)
.
Then uninstall the existing Nvidia driver, build kernel module from local source files and reinstall it.
sudo /usr/bin/nvidia-uninstall
make modules -j$(nproc)
sudo make modules_install -j$(nproc)
sudo insmod kernel-open/nvidia.ko
sudo insmod kernel-open/nvidia-uvm.ko
sudo insmod kernel-open/nvidia-drm.ko
sudo insmod kernel-open/nvidia-modeset.ko
sudo sh NVIDIA-Linux-x86_64-$DRIVER_VERSION.run --no-kernel-modules
When downloading large models within gVisor, you might encounter application
segmentation faults due to host VMA exhaustion. To workaround this, you can set
the value of /proc/sys/vm/max_map_count
to a large number.
echo 1000000 | sudo tee /proc/sys/vm/max_map_count
Alternatively, you can also just pass the runsc flag --host-settings=enforce
.
While CUDA support enables important use cases for gVisor, it is important for users to understand the security model around the use of GPUs in sandboxes. In short, while gVisor will protect the host from the sandboxed application, NVIDIA driver updates must be part of any security plan with or without gVisor.
First, a short discussion on gvisor’s security model. gVisor protects the host from sandboxed applications by providing several layers of defense. The layers most relevant to this discussion are the redirection of application syscalls to the gVisor sandbox and use of seccomp-bpf on gVisor sandboxes.
gVisor uses a “platform” to tell the host kernel to reroute system calls to the sandbox process, known as the sentry. The sentry implements a syscall table, which services all application syscalls. The Sentry may make syscalls to the host kernel if it needs them to fulfill the application syscall, but it doesn’t merely pass an application syscall to the host kernel.
On sandbox boot, seccomp filters are applied to the sandbox. Seccomp filters applied to the sandbox constrain the set of syscalls that it can make to the host kernel, blocking access to most host kernel vulnerabilities even if the sandbox becomes compromised.
For example, CVE-2022-0185 is mitigated because gVisor itself handles the syscalls required to use namespaces and capabilities, so the application is using gVisor’s implementation, not the host kernel’s. For a compromised sandbox, the syscalls required to exploit the vulnerability are blocked by seccomp filters.
In addition, seccomp-bpf filters can filter by argument names allowing us to
allowlist granularly by ioctl(2)
arguments. ioctl(2)
is a source of many
bugs in any kernel due to the complexity of its implementation. As of writing,
gVisor does
allowlist some ioctl
s
by argument for things like terminal support.
For example, CVE-2024-21626
is mitigated by gVisor because the application would use gVisor’s implementation
of ioctl(2)
. For a compromised sentry, ioctl(2)
calls with the needed
arguments are not in the seccomp filter allowlist, blocking the attacker from
making the call. gVisor also mitigates similar vulnerabilities that come with
device drivers
(CVE-2023-33107).
Recall that “nvproxy” allows applications to directly interact with supported ioctls defined in the NVIDIA driver.
gVisor’s seccomp filter rules are modified such that ioctl(2)
calls can be
made
only for supported ioctls.
The allowlisted rules aligned with each
driver version.
This approach is similar to the allowlisted ioctls for terminal support
described above. This allows gVisor to retain the vast majority of its
protection for the host while allowing access to GPUs. All of the above CVEs
remain mitigated even when “nvproxy” is used.
However, gVisor is much less effective at mitigating vulnerabilities within the
NVIDIA GPU drivers themselves, because gVisor passes through calls to be
handled by the kernel module. If there is a vulnerability in a given driver for
a given GPU ioctl
(read feature) that gVisor passes through, then gVisor will
also be vulnerable. If the vulnerability is in an unimplemented feature, gVisor
will block the required calls with seccomp filters.
In addition, gVisor doesn’t introduce any additional hardware-level isolation
beyond that which is configured by by the NVIDIA kernel-mode driver. There is no
validation of things like DMA buffers. The only checks are done in seccomp-bpf
rules to ensure ioctl(2)
calls are made on supported and allowlisted ioctl
s.
Therefore, it is imperative that users update NVIDIA drivers in a timely manner with or without gVisor. To see the latest drivers gVisor supports, you can run the following with your runsc release:
$ runsc nvproxy list-supported-drivers
Alternatively you can view the source code or download it and run:
$ make run TARGETS=runsc:runsc ARGS="nvproxy list-supported-drivers"
While gVisor doesn’t protect against all NVIDIA driver vulnerabilities, it does protect against a large set of general vulnerabilities in Linux. Applications don’t just use GPUs, they use them as a part of a larger application that may include third party libraries. For example, Tensorflow suffers from the same kind of vulnerabilities that every application does. Designing and implementing an application with security in mind is hard and in the emerging AI space, security is often overlooked in favor of getting to market fast. There are also many services that allow users to run external users’ code on the vendor’s infrastructure. gVisor is well suited as part of a larger security plan for these and other use cases.