This guide describes how to obtain Prometheus monitoring data from gVisor
sandboxes running with runsc
.
NOTE: These metrics are mostly information about gVisor internals, and do not provide introspection capabilities into the workload being sandboxed. If you would like to monitor the sandboxed workload (e.g. for threat detection), refer to Runtime Monitoring.
runsc
implements a
Prometheus-compliant
HTTP metric server using the runsc metric-server
subcommand. This server is
meant to run unsandboxed as a sidecar process of your container runtime
(e.g. Docker).
You can export metric information from running sandboxes using the runsc
export-metrics
subcommand. This does not require special configuration or
setting up a Prometheus server.
$ docker run -d --runtime=runsc --name=foobar debian sleep 1h
c7ce77796e0ece4c0881fb26261608552ea4a67b2fe5934658b8b4433e5190ed
$ sudo /path/to/runsc --root=/var/run/docker/runtime-runc/moby export-metrics c7ce77796e0ece4c0881fb26261608552ea4a67b2fe5934658b8b4433e5190ed
# Command-line export for sandbox c7ce77796e0ece4c0881fb26261608552ea4a67b2fe5934658b8b4433e5190ed
# Writing data from snapshot containing 175 data points taken at 2023-01-25 15:46:50.469403696 -0800 PST.
# HELP runsc_fs_opens Number of file opens.
# TYPE runsc_fs_opens counter
runsc_fs_opens{sandbox="c7ce77796e0ece4c0881fb26261608552ea4a67b2fe5934658b8b4433e5190ed"} 62 1674690410469
# HELP runsc_fs_read_wait Time waiting on file reads, in nanoseconds.
# TYPE runsc_fs_read_wait counter
runsc_fs_read_wait{sandbox="c7ce77796e0ece4c0881fb26261608552ea4a67b2fe5934658b8b4433e5190ed"} 0 1674690410469
# HELP runsc_fs_reads Number of file reads.
# TYPE runsc_fs_reads counter
runsc_fs_reads{sandbox="c7ce77796e0ece4c0881fb26261608552ea4a67b2fe5934658b8b4433e5190ed"} 54 1674690410469
# [...]
Use the runsc metric-server
subcommand:
$ sudo runsc \
--root=/var/run/docker/runtime-runc/moby \
--metric-server=localhost:1337 \
metric-server
--root
needs to be set to the OCI runtime root directory that your runtime
implementation uses. For Docker, this is typically
/var/run/docker/runtime-runc/moby
; otherwise, if you already have gVisor set
up, you can use ps aux | grep runsc
on the host to find the --root
that a
running sandbox is using. This directory is typically only accessible by the
user Docker runs as (usually root
), hence sudo
. The metric server uses the
--root
directory to scan for sandboxes running on the system.
The --metric-server
flag is the network address or UDS path to bind to. In
this example, this will create a server bound on all interfaces on TCP port
1337
. To listen on lo
only, you could alternatively use
--metric-server=localhost:1337
.
If something goes wrong, you may also want to add --debug
--debug-log=/dev/stderr
to understand the metric server’s behavior.
You can query the metric server with curl
:
$ curl http://localhost:1337/metrics
# Data for runsc metric server exporting data for sandboxes in root directory /var/run/docker/runtime-runc/moby
# [...]
# HELP process_start_time_seconds Unix timestamp at which the process started. Used by Prometheus for counter resets.
# TYPE process_start_time_seconds gauge
process_start_time_seconds 1674598082.698509 1674598109532
# End of metric data.
Sandbox metrics are disabled by default. To enable, add the flag
--metric-server={ADDRESS}:{PORT}
to the runtime configuration. With Docker,
this can be set in /etc/docker/daemon.json
like so:
{
"runtimes": {
"runsc": {
"path": "/path/to/runsc",
"runtimeArgs": [
"--metric-server=localhost:1337"
]
}
}
}
NOTE: The --metric-server
flag value must be an exact string match between
the runtime configuration and the runsc metric-server
command.
Once you’ve done this, you can start a container and see that it shows up in the list of Prometheus metrics.
$ docker run -d --runtime=runsc --name=foobar debian sleep 1h
32beefcafe
$ curl http://localhost:1337/metrics
# Data for runsc metric server exporting data for sandboxes in root directory /var/run/docker/runtime-runc/moby
# Writing data from 3 snapshots: [...]
# HELP process_start_time_seconds Unix timestamp at which the process started. Used by Prometheus for counter resets.
# TYPE process_start_time_seconds gauge
process_start_time_seconds 1674599158.286067 1674599159819
# HELP runsc_fs_opens Number of file opens.
# TYPE runsc_fs_opens counter
runsc_fs_opens{iteration="42asdf",sandbox="32beefcafe"} 12 1674599159819
# HELP runsc_fs_read_wait Time waiting on file reads, in nanoseconds.
# TYPE runsc_fs_read_wait counter
runsc_fs_read_wait{iteration="42asdf",sandbox="32beefcafe"} 0 1674599159819
# [...]
# End of metric data.
Each per-container metric is labeled with at least:
sandbox
: The container ID, in this case 32beefcafe
iteration
: A randomly-generated string (in this case 42asdf
) that stays
constant for the lifetime of the sandbox. This helps distinguish between
successive instances of the same sandbox with the same ID.If you’d like to run some containers with metrics turned off and some on within
the same system, use two runtime entries in /etc/docker/daemon.json
with only
one of them having the --metric-server
flag set.
The metric server exposes a
standard /metrics
HTTP endpoint
on the address given by the --metric-server
flag passed to runsc
metric-server
. Simply point Prometheus at this address.
If desired, you can change the
exporter name
(prefix applied to all metric names) using the --exporter-prefix
flag. It
defaults to runsc_
.
The sandbox metrics exported may be filtered by using the optional GET
parameter runsc-sandbox-metrics-filter
, e.g.
/metrics?runsc-sandbox-metrics-filter=fs_.*
. Metric names must fully match
this regular expression. Note that this filtering is performed before prepending
--exporter-prefix
to metric names.
The metric server also supports listening on a
Unix Domain Socket. This can
be convenient to avoid reserving port numbers on the machine’s network
interface, or for tighter control over who can read the data. Clients should
talk HTTP over this UDS. While Prometheus doesn’t natively support reading
metrics from a UDS, this feature can be used in conjunction with a tool such as
socat
to re-expose this as a regular TCP port
within another context (e.g. a tightly-managed network namespace that Prometheus
runs in).
$ sudo runsc --root=/var/run/docker/runtime-runc/moby --metric-server=/run/docker/runsc-metrics.sock metric-server &
$ sudo curl --unix-socket /run/docker/runsc-metrics.sock http://runsc-metrics/metrics
# Data for runsc metric server exporting data for sandboxes in root directory /var/run/docker/runtime-runc/moby
# [...]
# End of metric data.
# Set up socat to forward requests from *:1337 to /run/docker/runsc-metrics.sock in its own network namespace:
$ sudo unshare --net socat TCP-LISTEN:1337,reuseaddr,fork UNIX-CONNECT:/run/docker/runsc-metrics.sock &
# Set up basic networking for socat's network namespace:
$ sudo nsenter --net="/proc/$(pidof socat)/ns/net" sh -c 'ip link set lo up && ip route add default dev lo'
# Grab metric data from this namespace:
$ sudo nsenter --net="/proc/$(pidof socat)/ns/net" curl http://localhost:1337/metrics
# Data for runsc metric server exporting data for sandboxes in root directory /var/run/docker/runtime-runc/moby
# [...]
# End of metric data.
If you would like to run the metric server in a gVisor sandbox, you may do so, provided that you give it access to the OCI runtime root directory, forward the network port it binds to for external access, and enable host UDS support.
WARNING: Doing this does not provide you the full security of gVisor, as it still grants the metric server full control over all running gVisor sandboxes on the system. This step is only a defense-in-depth measure.
To do this, add a runtime with the --host-uds=all
flag to
/etc/docker/daemon.json
. The metric server needs the ability to open existing
UDSs (in order to communicate with running sandboxes), and to create new UDSs
(in order to create and listen on /run/docker/runsc-metrics.sock
).
{
"runtimes": {
"runsc": {
"path": "/path/to/runsc",
"runtimeArgs": [
"--metric-server=/run/docker/runsc-metrics.sock"
]
},
"runsc-metric-server": {
"path": "/path/to/runsc",
"runtimeArgs": [
"--metric-server=/run/docker/runsc-metrics.sock",
"--host-uds=all"
]
}
}
}
Then start the metric server with this runtime, passing through the directories
containing the control files runsc
uses to detect and communicate with running
sandboxes:
$ docker run -d --runtime=runsc-metric-server --name=runsc-metric-server \
--volume="$(which runsc):/runsc:ro" \
--volume=/var/run/docker/runtime-runc/moby:/var/run/docker/runtime-runc/moby \
--volume=/run/docker:/run/docker \
--volume=/var/run:/var/run \
alpine \
/runsc \
--root=/var/run/docker/runtime-runc/moby \
--metric-server=/run/docker/runsc-metrics.sock \
--debug --debug-log=/dev/stderr \
metric-server
Yes, this means the metric server will report data about its own sandbox:
$ metric_server_id="$(docker inspect --format='' runsc-metric-server)"
$ sudo curl --unix-socket /run/docker/runsc-metrics.sock http://runsc-metrics/metrics | grep "$metric_server_id"
# - Snapshot with 175 data points taken at 2023-01-25 15:45:33.70256855 -0800 -0800: map[iteration:2407456650315156914 sandbox:737ce142058561d764ad870d028130a29944821dd918c7979351b249d5d30481]
runsc_fs_opens{iteration="2407456650315156914",sandbox="737ce142058561d764ad870d028130a29944821dd918c7979351b249d5d30481"} 54 1674690333702
runsc_fs_read_wait{iteration="2407456650315156914",sandbox="737ce142058561d764ad870d028130a29944821dd918c7979351b249d5d30481"} 0 1674690333702
runsc_fs_reads{iteration="2407456650315156914",sandbox="737ce142058561d764ad870d028130a29944821dd918c7979351b249d5d30481"} 52 1674690333702
# [...]
When using Kubernetes, users typically deal with pod names and container names. On Kubelet machines, the underlying container names passed to the runtime are non-human-friendly hexadecimal strings.
In order to provide more user-friendly labels, the metric server will pick up
the io.kubernetes.cri.sandbox-name
and io.kubernetes.cri.sandbox-namespace
annotations provided by containerd
, and automatically add these as labels
(pod_name
and namespace_name
respectively) for each per-sandbox metric.
The metric server exports a lot of gVisor-internal metrics, and generates its
own metrics as well. All metrics have documentation and type annotations in the
/metrics
output, and this section aims to document some useful ones.
process_start_time_seconds
: Unix timestamp representing the time at which
the metric server started. This specific metric name is used by Prometheus,
and as such its name is not affected by the --exporter-prefix
flag. This
metric is process-wide and has no labels.num_sandboxes_total
: A process-wide metric representing the total number
of sandboxes that the metric server knows about.num_sandboxes_running
: A process-wide metric representing the number of
running sandboxes that the metric server knows about.num_sandboxes_broken_metrics
: A process-wide metric representing the
number of sandboxes from which the metric server could not get metric data.sandbox_presence
: A per-sandbox metric that is set to 1
for each sandbox
that the metric server knows about. This can be used to join with other
per-sandbox or per-pod metrics for which metric existence is not guaranteed.sandbox_running
: A per-sandbox metric that is set to 1
for each sandbox
that the metric server knows about and that is actively running. This can be
used in conjunction with sandbox_presence
to determine the set of
sandboxes that aren’t running; useful if you want to alert about sandboxes
that are down.sandbox_metadata
: A per-sandbox metric that carries a superset of the
typical per-sandbox labels found on other per-sandbox metrics. These extra
labels contain useful metadata about the sandbox, such as the version
number, platform, and network type being
used.sandbox_capabilities
: A per-sandbox, per-capability metric that carries
the union of all capabilities present on at least one container of the
sandbox. Can optionally be filtered to only a subset of capabilities using
the runsc-capability-filter
GET parameter on /metrics
requests (regular
expression). Useful for auditing and aggregating the capabilities you rely
on across multiple sandboxes.sandbox_creation_time_seconds
: A per-sandbox Unix timestamp representing
the time at which this sandbox was created.