Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement membarrier #267

Open
ianlewis opened this issue May 31, 2019 · 24 comments
Open

Implement membarrier #267

ianlewis opened this issue May 31, 2019 · 24 comments
Labels
area: compatibility Issue related to (Linux) kernel compatibility priority: p3 Low priority type: enhancement New feature or request

Comments

@ianlewis
Copy link
Contributor

ianlewis commented May 31, 2019

http://man7.org/linux/man-pages/man2/membarrier.2.html

Note: Very few applications have a hard requirement for membarrier. If you encounter a warning about unimplemented membarrier, the application most likely attempted to use membarrier, triggering the warning, and then fell back to another mechanism.

@ianlewis ianlewis added type: enhancement New feature or request area: compatibility Issue related to (Linux) kernel compatibility priority: p3 Low priority labels May 31, 2019
@razfriman
Copy link

Yes please!! Having issues running C# ASPNET Core containers on gvisor because of this:

Container Sandbox Limitation: Unsupported syscall membarrier(0x0,0x0,0xb,0x0,0x3e0619a02108,0x1). Please, refer to https://gvisor.dev/c/linux/amd64/membarrier

Any workarounds known?

@prattmic
Copy link
Member

As far as I know, coreclr does not require membarrier. It attempts to use it, but has a fallback if it is unsupported:

https://github.com/dotnet/coreclr/blob/b283f8c9833d9c38b4e21640c6aada16fd642bde/src/pal/src/thread/process.cpp#L141-L143
https://github.com/dotnet/coreclr/blob/b283f8c9833d9c38b4e21640c6aada16fd642bde/src/pal/src/thread/process.cpp#L3484-L3494
https://github.com/dotnet/coreclr/blob/b283f8c9833d9c38b4e21640c6aada16fd642bde/src/pal/src/thread/process.cpp#L3548

Are you certain that membarrier is the problem? These log messages do not necessarily indicate that they are the cause of a failure. Perhaps you can open a new issue with additional information about the failure.

@razfriman
Copy link

Sure I’ll open a new issue thanks for the swift response!

@razfriman
Copy link

As far as I know, coreclr does not require membarrier. It attempts to use it, but has a fallback if it is unsupported:

https://github.com/dotnet/coreclr/blob/b283f8c9833d9c38b4e21640c6aada16fd642bde/src/pal/src/thread/process.cpp#L141-L143
https://github.com/dotnet/coreclr/blob/b283f8c9833d9c38b4e21640c6aada16fd642bde/src/pal/src/thread/process.cpp#L3484-L3494
https://github.com/dotnet/coreclr/blob/b283f8c9833d9c38b4e21640c6aada16fd642bde/src/pal/src/thread/process.cpp#L3548

Are you certain that membarrier is the problem? These log messages do not necessarily indicate that they are the cause of a failure. Perhaps you can open a new issue with additional information about the failure.

Great idea, I have created a new issue here: #1036 Thank you!

@razfriman
Copy link

As far as I know, coreclr does not require membarrier. It attempts to use it, but has a fallback if it is unsupported:
https://github.com/dotnet/coreclr/blob/b283f8c9833d9c38b4e21640c6aada16fd642bde/src/pal/src/thread/process.cpp#L141-L143
https://github.com/dotnet/coreclr/blob/b283f8c9833d9c38b4e21640c6aada16fd642bde/src/pal/src/thread/process.cpp#L3484-L3494
https://github.com/dotnet/coreclr/blob/b283f8c9833d9c38b4e21640c6aada16fd642bde/src/pal/src/thread/process.cpp#L3548
Are you certain that membarrier is the problem? These log messages do not necessarily indicate that they are the cause of a failure. Perhaps you can open a new issue with additional information about the failure.

Great idea, I have created a new issue here: #1036 Thank you!

EDIT: My issue was unrelated in the end - Please feel free to ignore my comment

@znorris
Copy link

znorris commented Oct 28, 2019

Getting this issue when using ffmpeg in GCP cloud run.

@prattmic
Copy link
Member

ffmpeg does not require membarrier in any configuration that I am aware of, so I suspect that warning is simply misleading and failures are due to some other reason. It simply tries to use membarrier (emitting this warning), then falls back to another mechanism.

Please provide additional logs if you think this really is related to membarrier.

gvisor-bot pushed a commit that referenced this issue Nov 4, 2019
Updates #267

PiperOrigin-RevId: 278402684
@meteatamel
Copy link

I see the following in the logs when I deploy a simple ASP.NET Core 3.0 app on Cloud Run:

Container Sandbox Limitation: Unsupported syscall membarrier(0x8,0x0,0x0,0x1,0x3ef991203508,0x3ef991203518). Please, refer to https://gvisor.dev/c/linux/amd64/membarrier for more information.

It doesn't seem to cause issues but it makes you think that something is wrong.

@andresamayadiaz
Copy link

Having the same issue with a NodeJS App.

@kscerbiakas
Copy link

kscerbiakas commented Dec 3, 2019

Same issue trying to run NodeJS too. Any workarounds or fixes?

UPDATE
I managed to get my app running.
I had couple of issues regarding Cloud Run connecting to CloudSQL, so make sure you have all APIs enabled for your project, IAM permissions etc if your app uses MySQL.
Then it started to boot properly and work.
But I still get membarrier error pretty randomly. I just launch new revision with same settings and it launches ok.

@whollacsek
Copy link

whollacsek commented Dec 4, 2019 via email

@ianlewis
Copy link
Contributor Author

ianlewis commented Dec 4, 2019

@andresamayadiaz @kscerbiakas @whollacsek Are all of you encountering the issue when using Cloud Run? What exactly are the issues you encounter?

Again, it's likely not the lack of membarrier causing the issue but rather something else (or a combination of things). If we can get enough info on the issues folks are encountering we could start narrowing down the issue better.

@whollacsek
Copy link

whollacsek commented Dec 5, 2019 via email

@prattmic
Copy link
Member

prattmic commented Dec 5, 2019

@whollacsek Are you using an Alpine-based Node.JS image? If so, I suspect you are hitting nodejs/docker-node#1158.

While that doesn't seem to be a gVisor issue (as it occurs on vanilla AWS VMs), I've been trying to track down the issue. Unfortunately, I haven't been able to reproduce crashes with any of my test apps.

Do you (or anyone else experiencing segfaults (signal 11)) have an image you can share that reproduces this issue?

Alternatively, any strace logs you can provide would be very helpful.

  • Ideally, use runsc with debugging enabled and provide the .boot log file.
  • If you can't reproduce with runsc locally, you can install strace in your image and run with strace on Cloud Run. I think on Alpine it would be apk update && apk add strace to install strace.

Some workarounds that may resolve this issue:

  • Pin to the previous version of the Node base image. Assuming you are on node:10-alpine, pin to node:10.16.1-alpine. If possible, also try node:10.17.0-alpine3.9. The latest release of node:10-alpine upgraded Node from 10.16.1 to 10.17 and Alpine from 3.9 to 3.10. I'd be curious to know which of those changes introduces the crash.
  • Try upgrading to Node 13. In my investigation thus far, Node 13 seems to have several changes that will make it behave better with musl libc (used by Alpine). It's possible that Node 13 won't encounter issues.
  • Switch to a non-Alpine base image, like node:10-slim.

@whollacsek
Copy link

whollacsek commented Dec 5, 2019 via email

@kscerbiakas
Copy link

@prattmic I have tried to switch images to node:10.17.0-alpine3.9 and node:10-stretch it seems to be working fine. Fingers crossed. Today I didn't receive any membarriers whatsoever. Thank you!

But one more question: how to improve performance ? I receive too much of Rate exceeded any tips?

@whollacsek
Copy link

whollacsek commented Dec 6, 2019 via email

@prattmic
Copy link
Member

prattmic commented Dec 6, 2019

Any chance either of you could provide strace logs from a crashing instance? nodejs/docker-node#1158 still has no repro or additional information, so anything more we can get will help resolve the underlying issue.

@takashi
Copy link

takashi commented Feb 27, 2020

I'm facing same issue on ruby:2.6.5-alpine based image.
Can anyone know the workaround for this ? I'm thinking the workaround is just quit using alpine image is better choice for now

@kaisellgren
Copy link

kaisellgren commented May 14, 2020

I get this error too when using NodeJS with https://github.com/lovell/sharp
That library is based on a C library: https://github.com/libvips/libvips

I can't get my Cloud Run up at all. Reverted to an earlier version without this node module and now things are running again.

Edit: It turned out the issue was caused by node-alpine base image.

@stevefan1999-personal
Copy link

I see that there are many people pointing finger to Alpine. Maybe musl libc, which Alpine requires, had some specific use of membarrier?

@ianlewis
Copy link
Contributor Author

Another user had issues and saw membarrier not supported log messages on Cloud Run.
https://twitter.com/knjplvsbr/status/1308111376359895040

They are using Python which I think should fall back but their base Docker image is alpine. Maybe another case of alpine being the cause?
https://github.com/jd10ne/snap_site

@nixprime
Copy link
Member

nixprime commented Sep 22, 2020

Another user had issues and saw membarrier not supported log messages on Cloud Run.
https://twitter.com/knjplvsbr/status/1308111376359895040

They are using Python which I think should fall back but their base Docker image is alpine. Maybe another case of alpine being the cause?
https://github.com/jd10ne/snap_site

Alpine Linux uses musl libc, which does invoke both membarrier(MEMBARRIER_CMD_REGISTER_PRIVATE_EXPEDITED=0x10, 0) as reported in that Twitter chain, and membarrier(MEMBARRIER_CMD_PRIVATE_EXPEDITED=0x8, 0) as reported elsewhere in this issue. However, musl ignores the result of MEMBARRIER_CMD_REGISTER_PRIVATE_EXPEDITED (which enables use of MEMBARRIER_CMD_PRIVATE_EXPEDITED) and has a silent fallback for MEMBARRIER_CMD_PRIVATE_EXPEDITED, which consists of doing nearly the same thing we would do (sending an interrupt signal to each thread in the process), so lack of membarrier support is not fatal to musl.

copybara-service bot pushed a commit that referenced this issue Sep 22, 2020
MEMBARRIER_CMD_PRIVATE_EXPEDITED is not implemented by interrupting all threads
in the thread group because the actual implementation on Linux interrupts all
running threads sharing the MM, which is sufficiently different that doing the
wrong thing would risk silently corrupting application memory.

Updates #267

PiperOrigin-RevId: 333018403
@ianlewis
Copy link
Contributor Author

Alpine Linux uses musl libc, which does invoke both membarrier(MEMBARRIER_CMD_REGISTER_PRIVATE_EXPEDITED=0x10, 0) as reported in that Twitter chain, and membarrier(MEMBARRIER_CMD_PRIVATE_EXPEDITED=0x8, 0) as reported elsewhere in this issue. However, musl ignores the result of MEMBARRIER_CMD_REGISTER_PRIVATE_EXPEDITED (which enables use of MEMBARRIER_CMD_PRIVATE_EXPEDITED) and has a silent fallback for MEMBARRIER_CMD_PRIVATE_EXPEDITED, which consists of doing nearly the same thing we would do (sending an interrupt signal to each thread in the process), so lack of membarrier support is not fatal to musl.

I vaguely remember that it may not have worked with older versions but can't remember. In this case anyway, the python program was apparently was not taking connections and clients were timing out rather than crashing so that leads me to think it's likely a red herring.

copybara-service bot pushed a commit that referenced this issue Sep 24, 2020
copybara-service bot pushed a commit that referenced this issue Oct 2, 2020
copybara-service bot pushed a commit that referenced this issue Oct 6, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area: compatibility Issue related to (Linux) kernel compatibility priority: p3 Low priority type: enhancement New feature or request
Projects
None yet
Development

No branches or pull requests