New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Signalfd support #139
Comments
How is this going ? |
There is no one currently working on this. Is this something you could work on? |
We (I am from antfin) have requirements for signalfd, but we are not starting working on this neither. |
I took a stab at this here: It's relatively straight-forward, but it first needs some changes to EventRegister / EventUnregister (plumbing through context). |
https://twitter.com/ptinsley/status/1150411814246715392?s=19 s6-overlay fails to start with:
From https://github.com/djbtao/skalibs/blob/c2ed75c9838767af60a05451b6d216331c1dbccf/src/libstddjb/selfpipe_init.c, I believe it needs signalfd. |
I have the same error as @prattmic when I deploy a container uses s6-overlay on Cloud Run, |
Also ran into this issue when trying to deploy a container using s6-overlay on Cloud Run. |
Signalfd support has been merged in c98e7f0. This will roll-out to Cloud Run at some point in the near future (depends on their release pipeline). |
I'll leave this open for now because the support in c98e7f0 diverged from the core Linux semantics in a couple of ways, though there will be no effect on how signalfds are used in libstddjb. |
I tested some more with this locally, and while containers with s6-overlay now succesfully can start (yay!), the process that supervises services is not working properly. Normally, I created a small docker container I used to reproduce this issue without all the other stuff I initially had installed in the container. The Dockerfile + context can be found here: All the debug-logfiles from a run of this container are attached. |
@wmuizelaar I don't have a local gvisor setup but can you try this repo, which is a python based web server. It might be a more minimal repro, possibly. https://github.com/ahmetb/multi-process-container |
Sure! The result looks exactly the same. Log-output from the container:
When running 'regular' docker, it gives the starting-statements as well, which don't appear with gvisor:
Also, the output of
And now -with- gvisor:
Logfiles again: |
The signalfd descriptors otherwise always show as available. This can lead programs to spin, assuming they are looking to see what signals are pending. Updates #139 PiperOrigin-RevId: 272949671
Thanks! These repro cases were super useful in understanding the problem. I have a pull request (#972) to fix the bug and have validated that both examples work as expected with that in. (The pull request includes a test for the specific issue.) |
It would be great if you could also validate that the container works as expected with this change. |
Wow, that's a quick response! I can confirm that with #972 everything works as it should be. Thanks a lot! |
The signalfd descriptors otherwise always show as available. This can lead programs to spin, assuming they are looking to see what signals are pending. Updates #139 PiperOrigin-RevId: 274017890
Hey @wmuizelaar - is there any chance you can share how you got s6-overlay running in gVisor? With gVisor I run into a mkfifo operation not permitted issue... It's not really within the scope of this issue, but any hints would be awesome :) |
I remember seeing that one, I believe it was fixed by using the latest version of s6-overlay icm the latest gvisor nightly. In my Dockerfile there is just a |
@erulabs Do you have a specific version which is not working that I can test? I suspect this is a result of the execution environment / configuration and not strictly a missing feature. I believe that mkfifo is only supported on an in-sandbox tmpfs, i.e. the gofer will not create a host-based named pipe. There's work to ensure that EmptyDirs are automatically turned into sandbox-internal tmpfs mounts for Kubernetes (e.g. fc746ef is related). @fvoznika can comment further. You may be able to support those by specifying appropriate EmptyDirs for the application. |
@wmuizelaar hrm - thanks for the reply :) @amscanne I'm using nightly gVisor (runsc --version returns runsc version release-20190806.1-329-g1c480abc39b9) and s6-overlay 1.22.0 and 1.22.1 and get this issue with both versions of s6:
I've tried mounting an EmptyDir after looking at the tmpfs stuff at /var/run and get the same issue - Here is a Kubernetes example: https://kubesail.com/template/erulabs/sonarr/1 (Running that on KubeSail reproduces the error - KubeSail uses gVisor under the hood 💃) Let me know if that's helpful - I'll keep digging on my side. Thanks! |
I think the EmptyDir stuff may still be in flight. Since it's unrelated to signfalfd, I've forked off into a separate issue (#1102) where we can investigate. Thanks! |
All the basics of s6-overlay now work, but now I'm trying to do some more advanced things that fail, which might also be related to signalfd. What I'm trying to do is use s6-notifyoncheck to let 1 service be started after a previous one is deemed ready. This tool then runs a check-script, and uses a file descriptor to signal the output to. See https://skarnet.org/software/s6/s6-notifyoncheck.html and https://wiki.gentoo.org/wiki/S6#s6-notifyoncheck for details on how this should work. The part where you specific a specific file descriptor-number that should be used for this process makes me assume we're hitting a corner-case in signalfd-support. When running without gvisor, I see the check executed every second (like it should), with gvisor I only see the first check-attempt being made, but a second attempt is never made. Probably because the writing to the custom file descriptor hangs? I found it hard to troubleshoot where this is going wrong. Full logs attached, it there's any hint on how to further troubleshoot/pinpoit, that would be welcome! |
gVisor seems to support mkfifo only on tmpfs: google/gvisor#139
Support for:
The text was updated successfully, but these errors were encountered: