gVisor can add a layer of security to your TPU-based applications. By running these applications in a sandboxed environment, you can isolate your host system from potential vulnerabilities in code. This is crucial for handling sensitive data or deploying untrusted workloads.
gVisor supports running workloads that leverage TPU accelerators by proxying
hardware driver commands to the host with a feature called tpuproxy
.
tpuproxy
exposes host TPU devices to the sandbox so users can run their TPU
applications without any modifications.
The runsc
flag --tpuproxy
must be specified to enable TPU support. In GKE
this is done automatically for any sandbox node using a supported TPU machine
type.
gVisor supports a wide range of workloads, including PyTorch and various generative models like LLMs. Check out this blog post about running Stable Diffusion with gVisor. gVisor undergoes continuous tests to ensure this functionality remains robust.
tpuproxy
is a passthrough driver that forwards ioctl(2)
calls made to TPU
devices by the containerized application directly to the host TPU driver. This
forwarding is straightforward: ioctl
parameters are copied from the
application’s address space to the sentry’s address space, and then a host
ioctl
syscall is made. ioctl
s are passed through with minimal intervention.
tpuproxy
also sets up a proxy sysfs filesystem that enables reading
configuration and status information of TPU devices on the host PCI bus. This
design translates to minimal overhead and maximal compatibility for TPU
operations, ensuring that TPU bound workloads experience negligible performance
impact.
gVisor currently supports TPU models: V4lite V4pod, V5, and V5e. open a GitHub issue if you want support for another TPU model. gVisor only supports “1VM” TPU shapes.
gVisor exposes /dev/accel[0-9]+
for TPU V4 and below. For TPU V5 and beyond,
gVisor exposes /dev/vfio
and /dev/vfio/[0-9]+
. For all versions, tpuproxy
exposes a read-only copy of the contents of TPU PCI device files located in the
host’s sysfs directory.
Although tpuproxy
enables sandboxed applications to run TPU accelerated
workloads, it does not provide the same level of isolation from host hardware
that it does for traditional CPU workloads. At a high level this is because
gVisor emulates the Linux kernel, which itself has limited control over the
memory isolation and compute scheduling of external devices. A more detailed
discussion follows:
First, a short overview of gvisor’s security model. gVisor protects the host from sandboxed applications by providing several layers of defense. The layers most relevant to this discussion are the redirection of application syscalls to the gVisor sandbox and use of seccomp-bpf on gVisor sandboxes.
gVisor uses a “platform” to tell the host kernel to reroute system calls to the sandbox process, known as the sentry. The sentry implements a syscall table, which services all application syscalls. The Sentry may make syscalls to the host kernel if it needs them to fulfill the application syscall, but it doesn’t merely pass an application syscall to the host kernel.
On sandbox boot, seccomp filters are applied to the sandbox. Seccomp filters applied to the sandbox constrain the set of syscalls that it can make to the host kernel, blocking access to most host kernel vulnerabilities even if the sandbox becomes compromised.
For example, CVE-2022-0185 is mitigated because gVisor itself handles the syscalls required to use namespaces and capabilities, so the application is using gVisor’s implementation, not the host kernel’s. For a compromised sandbox, the syscalls required to exploit the vulnerability are blocked by seccomp filters.
In addition, seccomp-bpf filters can filter by argument names allowing us to
allowlist granularly by ioctl(2)
arguments. ioctl(2)
is a source of many
bugs in any kernel due to the complexity of its implementation. As of writing,
gVisor does
allowlist some ioctl
s
by argument for things like terminal support.
For example, CVE-2024-21626
is mitigated by gVisor because the application would use gVisor’s implementation
of ioctl(2)
. For a compromised sentry, ioctl(2)
calls with the needed
arguments are not in the seccomp filter allowlist, blocking the attacker from
making the call. gVisor also mitigates similar vulnerabilities that come with
device drivers
(CVE-2023-33107).
Recall that tpuproxy
allows applications to directly interact with supported
ioctls used by the TPU driver.
gVisor’s seccomp filter rules are modified such that ioctl(2)
calls can be
made
only for supported ioctls.
This approach is similar to the allowlisted ioctls for terminal support
described above. This allows gVisor to retain the vast majority of its
protection for the host while allowing access to TPUs. All of the above CVEs
remain mitigated even when tpuproxy
is used.
However, gVisor is much less effective at mitigating vulnerabilities within the
TPU drivers themselves, because gVisor passes through calls to be handled by
the kernel driver. If there is a vulnerability in the TPU driver for a given
ioctl
that gVisor passes through, then gVisor will also be vulnerable.
In addition, gVisor doesn’t introduce any additional hardware-level isolation
beyond that which is configured by the host. There is no validation of things
like DMA buffers. The only checks are done in seccomp-bpf rules to ensure
ioctl(2)
calls are made on supported and allowlisted ioctl
s.
NOTE: TPU V5 and beyond uses the VFIO Linux interface to drive TPU hardware. Theoretically VFIO could be used to configure memory isolation using the host IOMMU. However, this requires manual setup by the user application and does not come configured out of the box by gVisor.
While gVisor doesn’t protect against all TPU driver vulnerabilities, it does protect against a large set of general vulnerabilities in Linux. Applications don’t just use TPUs, they use them as a part of a larger application that may include third party libraries. For example, Tensorflow suffers from the same kind of vulnerabilities that every application does. Designing and implementing an application with security in mind is hard and in the emerging AI space, security is often overlooked in favor of getting to market fast. There are also many services that allow users to run external users’ code on the vendor’s infrastructure. gVisor is well suited as part of a larger security plan for these and other use cases.