gVisor has the ability to checkpoint a process, save its current state in a state file, and restore into a new container using the state file.
Checkpoint/restore functionality is currently available via raw runsc
commands. To use the checkpoint command, first run a container.
runsc run <container id>
To checkpoint the container, the --image-path flag must be provided. This is
the directory path within which the checkpoint related files will be created.
All necessary directories will be created if they do not yet exist.
Note: Two checkpoints cannot be saved to the same directory; every image-path provided must be unique.
runsc checkpoint --image-path=<path> <container id>
There is also an optional --leave-running flag that allows the container to
continue to run after the checkpoint has been made. (By default, containers stop
their processes after committing a checkpoint.)
Note: All top-level runsc flags needed when calling run must be provided to checkpoint if
--leave-runningis used.
Note:
--leave-runningfunctions by causing an immediate restore so the container, although will maintain its given container id, may have a different process id.
runsc checkpoint --image-path=<path> --leave-running <container id>
To restore, provide the image path to the directory containing all the files created during the checkpoint. Because containers stop by default after checkpointing, restore needs to happen in a new container (restore is a command which parallels start).
runsc create <container id>
runsc restore --image-path=<path> <container id>
Note: All top-level runsc flags needed when calling run must be provided to
restore.
gVisor supports several performance optimizations during checkpoint and restore.
These can be configured via flags provided to the runsc checkpoint and runsc
restore commands.
By providing the --compression flag to runsc checkpoint, users can specify
the compression level of the generated snapshot files. Supported values are
none (default) and flate-best-speed.
Note that --compression=none consumes less CPU and is faster. The generated
snapshot contains multiple files. As a result, it allows the kernel and memory
restores to proceed in parallel. Furthermore, several other optimizations
described below require --compression=none.
By providing the --exclude-committed-zero-pages flag to runsc checkpoint,
gVisor skips saving memory pages that are committed but contain only zeros. This
can significantly reduce the checkpoint size for applications that have large,
zero-filled memory regions (like LLMs), thereby speeding up restore. However, it
may increase checkpoint duration, as it requires scanning all committed pages to
determine if they are zero-filled.
By providing the --direct flag to runsc checkpoint or runsc restore,
gVisor uses O_DIRECT when writing or reading the pages file. This bypasses the
host page cache. This optimization requires --compression=none during
checkpoint. This is only supported on filesystems that support direct I/O.
This is particularly advantageous when the snapshot is being read for the first time from disk and will not be restored on the same machine again, making caching in the host page cache undesirable.
By providing the --background flag to runsc restore, the application can
start execution as soon as the kernel state is loaded. The remaining application
memory and file data are restored asynchronously in the background while the
application is running. This optimization requires --compression=none during
checkpoint.
If the application accesses a memory page that has not yet been restored, gVisor prioritizes loading that page immediately to unblock the application thread. This can dramatically reduce the “Time to First Instruction” for large applications.
Note that when this is enabled, the sandbox may continue to have an open FD on the snapshot files even after the sandboxed application has started. This means that until the sandbox has fully restored (async page loading has completed):
You can use runsc wait --restore to wait for restore to complete fully, after
which you can clean up the --image-path directory if necessary.
Run a container:
docker run [options] --runtime=runsc --name=<container-name> <image>
Checkpoint the container:
docker checkpoint create <container-name> <checkpoint-name>
Restore into the same container:
docker start --checkpoint <checkpoint-name> <container-name>
--leave-running flag. This issue is fixed in newer releases.--checkpoint-dir flag but this will be required when restoring from a
checkpoint made in another container.When restoring a state file, gVisor verifies that the target host machine possesses all the CPU features enabled on the machine where the checkpoint snapshot was created.
gVisor allows users to specify a list of allowed CPU features using the
annotation dev.gvisor.internal.cpufeatures. Only the host CPU features present
in this annotation list will be enabled. By doing this, users are able to
stabilize the list of CPU features that will be exposed to applications in the
sandbox, which makes it possible to checkpoint and restore among machines with
different set of CPU features.
CPU features in the annotation should be comma-separated. A comprehensive list of all supported CPU features can be found here.
The runsc command runsc cpu-features lists all CPU features on the current
machine.
GPU checkpoint/restore is not supported on the arm64 architecture due to lack of support in cuda-checkpoint.