Tracking issue for oom=panic (RFC 2116)
Several users who are invested in the "thread is a task" model that Rust was originally designed around have expressed a desire to unwind in the case of OOM. Specifically each task has large and unpredictable memory usage, and tearing down the first task that encounters OOM is considered a reasonable strategy.
Aborting on OOM is considered unacceptable because it would be expensive to discard all the work that the other tasks have managed to do.
We already (accidentally?) specified that all allocators can panic on OOM, including the global one, with the introduction of generic allocators on collections. So in theory no one "should" be relying on the default oom=abort semantics. This knob would only affect the default global allocator. Presumably there would just be a function somewhere in the stdlib which is cfg'd to be a panic or abort based on this flag?
This is not a replacement for proper fallible allocation handling routines requested in #29802 because most of the potential users of that API build with panic=abort. Also the requesters of this API are unwilling to go through the effort to ensure all their dependents are using fallible allocations everywhere.
I am dubious on this API but I'm filling an issue so that it's written down and everyone who wants it has a place to go and assert that it is in fact good and desirable.
Update: This now the tracking issue for a mechanism to make OOM / memory allocation errors panic (and so by default unwind the thread) instead of aborting the process. This feature was accepted in RFC https://github.com/rust-lang/rfcs/pull/2116.
It was separated from https://github.com/rust-lang/rust/issues/48043, which tracks another feature of the same RFC, since the two features will likely be stabilized at different times.
Blockers
This should be blocked until we can audit all places in the code that allocate to ensure they safely handle unwinding on OOM.
- [ ] https://github.com/rust-lang/rust/issues/123369
- [ ] https://github.com/rust-lang/unsafe-code-guidelines/issues/506
Steps:
- [ ] Implement the RFC
- [ ]
std::alloc::set_allocation_error_hook(tracked at https://github.com/rust-lang/rust/issues/51245) could potentially be this mechanism, but that hook is currently document as not allowed to unwind.
- [ ]
- [ ] Adjust documentation (see instructions on forge)
- [ ] Stabilization PR (see instructions on forge)
Original feature request:
Several users who are invested in the "thread is a task" model that Rust was originally designed around have expressed a desire to unwind in the case of OOM. Specifically each task has large and unpredictable memory usage, and tearing down the first task that encounters OOM is considered a reasonable strategy.
Aborting on OOM is considered unacceptable because it would be expensive to discard all the work that the other tasks have managed to do.
We already (accidentally?) specified that all allocators can panic on OOM, including the global one, with the introduction of generic allocators on collections. So in theory no one "should" be relying on the default
oom=abortsemantics. This knob would only affect the default global allocator. Presumably there would just be a function somewhere in the stdlib which is cfg'd to be a panic or abort based on this flag?This is not a replacement for proper fallible allocation handling routines requested in #29802 because most of the potential users of that API build with
panic=abort. Also the requesters of this API are unwilling to go through the effort to ensure all their dependents are using fallible allocations everywhere.I am dubious on this API but I'm filling an issue so that it's written down and everyone who wants it has a place to go and assert that it is in fact good and desirable.
Simon Sapin at 2018-06-27 12:40:05
Note that in previous topics I stated that libunwind doesn't handle OOM well, but this appears to be outdated or incorrect information. They preallocate their memory before an unwind is even requested. In the limit they might run out of memory and give up but it's not like we've lost anything for trying.
Aria Desires at 2017-08-01 18:58:33
I figured I'd comment on this with respect to https://github.com/rust-lang/rust/issues/42808 and the outcome there along with the @rust-lang/libs team discussion. It was discovered in https://github.com/rust-lang/rust/issues/42808 that the integration of the
Alloctrait caused a number of code size related regressions which when diagnosed were almost exclusively related to unwind tables and landing pads. This investigation then caused https://github.com/rust-lang/rust/pull/44049 as the only two stable implementations of allocators don't panic.Note, though, that this is not necessarily directly related to OOM but just allocator implementations themselves. Should we forbid, in general, panics from allocators? Should we only allow the oom method to panic?
Unsure!
Alex Crichton at 2017-08-23 21:57:48
I personally think it is reasonable to forbid the allocator itself from panicking and instead have the
oommethod be responsible for handling allocation failures and panicking or aborting as necessary. I also think it would be useful ifoomwas told the size of the allocation that failed so it could heuristically determine whether it is an actual memory exhaustion event or just someone accidentally specifying a silly large number.Peter Atashian at 2017-08-23 22:05:40
@retep998
fn oomdoes receive theAllocErr, which forAllocErr::Exhaustedwill carry the inputLayoutthat was requested. Does that not suffice for what you describe?(I personally wanted to try to stuff the
Layoutonto bothAllocErrvariants, but I worried about bloating the size of the error type after it was no longer an associated item for the allocator...)Felix S Klock II at 2017-08-24 20:32:36
@pnkfelix Yep, that would do exactly what I want. It's my fault for not looking at the existing API and seeing that we already had that 😛
Peter Atashian at 2017-08-24 20:39:35
RFC https://github.com/rust-lang/rfcs/pull/2116 was accepted, I’ve repurposed this feature request into a tracking issue. See more in this thread’s original message.
Simon Sapin at 2018-06-27 12:40:45
We now have
std::alloc::set_alloc_error_hook(tracked at https://github.com/rust-lang/rust/issues/51245) which is document as not allowed to unwind. If that restriction could be lifted, this hook could be the mechanism to implement this feature (by having the program register at start-up a hook that panics).Simon Sapin at 2018-06-27 12:46:35
Minor question: if someone sets
oom=panicandpanic=abort, should that function identically tooom=abort, or should it be a compilation error with the rationale that such a configuration is likely a mistake on the user's part?bstrie at 2018-10-15 05:22:13
I expect it would be mostly identical, though the error message that’s printed might not be exactly the same. I don’t know if it’s worth forbidding explicitly even if it’s not particularly useful. An observable effect (though again not particularly useful since
std::alloc::set_alloc_error_hookalso exists) would be that a hook set throughstd::panic::set_hookis run on OOM.Simon Sapin at 2018-10-15 07:03:40
Making allocation errors default to panic has undergone FCP successfully: https://github.com/rust-lang/rust/issues/66741
I wonder if there's a point to this cargo-level feature anymore?
Ehsanul Hoque at 2020-05-11 02:37:09
https://github.com/rust-lang/rust/issues/66741 is only relevant in programs where
libstdis not used at all. This issue is about the behavior oflibstd.Simon Sapin at 2020-05-11 13:42:51
I had sent my ideas to https://github.com/rust-lang/wg-allocators/issues/62 ; but as it looks like there already is a tracking issue, let's move the comments here.
Assume a rust server running multiple things on behalf of clients. It uses regular Rust code, that is not written to care specifically about this particular use case.
The server wants to limit the memory consumption clients could force upon it. It does so by having each client's allocations be in an arena allocator, and then wants to abort just that client's connection in case of “OOM.”
Unfortunately, this means that OOM must not abort the whole process, and AFAIU that is what the current behavior is, even with set_alloc_error_hook -- or at least so do the docs read like.
I'm not sure
oom=panicas a global setting could work properly in all cases, as potentially panicking requires memory allocation? But I think that already allowingset_alloc_error_hookto panic would allow the user to swap the error hook at the same time as the allocator, and in case of OOM revert to the global allocator and then, once it's possible to allocate again, perform the panic.Does that make sense?
Léo Gaspard at 2020-05-16 01:04:15
It's probably the case that (a) the amount of memory needed to alloc panic data is fairly low and (b) you'd hit oom from asking for some large amount that can't be offered. In this situation, you'd probably be able to allocate the panic data.
However, if you fail to allocate while panicing you'll just get a panic, and a panic during panic is an abort. So at least you're not trapped in a loop ;P
But also: you can't carelessly change the global allocator while allocated things are present in the world or they'll get potentially freed on the wrong allocator, which is obviously bad.
Basically the entire scenario you want can't be done with rust's current conventions regarding memory allocation. You'd need specialized local-allocator using code, or waiting for alloc-wg to make the entire ecosystem more alloc aware over time.
Lokathor at 2020-05-16 01:17:03
Just for clarification: I was assuming the existence of a global allocator that is able to use different memory pools for allocation depending on some thread-local. Entering a task's context would set the thread-local to the task's memory pool, and either exiting it or hitting an OOM during it would reset it to the global memory pool. It knows which memory pool to free from without the memory pool being necessarily the same as the one that was configured during allocation, thanks to knowing the address of the memory block to free.
By doing this, I think that what I want to do can be achieved with the current conventions regarding memory allocations, but iff
set_alloc_error_hookcan be set not to abort, but to panic instead :)Léo Gaspard at 2020-05-16 12:37:03
Oh, and I forgot to answer your initial points: unfortunately, a client able to make the server abort is a DoS vulnerability, so… that's not a good thing :( And I can totally see the client being able to fill the pool by making the server allocate one byte at a time -- the server could try to prevent that by keeping its own counter of how much memory each task has used, but it requires potentially large changes in every library used by the server, while the solution based on allocators would work without requiring cooperation from every piece of the code (including those I don't control) :)
Léo Gaspard at 2020-05-16 12:40:50
It knows which memory pool to free from without the memory pool being necessarily the same as the one that was configured during allocation, thanks to knowing the address of the memory block to free.
Yes, this aspect is critical to the soundness of this kind of scheme.
iff
set_alloc_error_hookcan be set not to abort, but to panic instead :)Yes, see https://github.com/rust-lang/rust/issues/43596#issuecomment-400659080. I don’t know why this restriction exists, though.
Simon Sapin at 2020-05-16 16:02:55
For why this restriction exists, I don't know why either, but (for other people who may be reading this, I know you've already read it) the three messages starting from https://github.com/rust-lang/rust/issues/51245#issuecomment-393503805 explain that it's indeed here :)
My first guess would be: panicking might need to allocate itself on some platforms… maybe? And potentially this allocation would not be handled by rust, and thus not be covered by rust's allocation handling, and then Things would happen.
Now, for use cases like the one I describe, where there still is memory available for the program and it's just an arena that got full, allowing the replacement would make sense. But maybe it'd require
set_alloc_error_hookto beunsafe?Léo Gaspard at 2020-05-16 16:58:59
panicking might need to allocate itself on some platforms… maybe?
It does, on all platforms.
First if you use
panic!with a formatting string and additional arguments, the formatted string is allocated:https://github.com/rust-lang/rust/blob/31add7e60709445617ab54a69f6f21cfcb2e3122/src/libstd/panicking.rs#L362-L363
Then that string is boxed to make panic payload pointer:
https://github.com/rust-lang/rust/blob/31add7e60709445617ab54a69f6f21cfcb2e3122/src/libstd/panicking.rs#L375
If instead you use
panic!with a single argument, that value is the payload which is also boxed:https://github.com/rust-lang/rust/blob/31add7e60709445617ab54a69f6f21cfcb2e3122/src/libstd/panicking.rs#L424
panic!with zero argument has a&'static strpayload that goes in the same code path as with single argument.https://github.com/rust-lang/rust/blob/31add7e60709445617ab54a69f6f21cfcb2e3122/src/libstd/macros.rs#L11-L12
potentially this allocation would not be handled by rust, and thus not be covered by rust's allocation handling, and then Things would happen.
No, these allocations do go through Rust’s global allocator.
If they fail and call a alloc error hook that panics, this is another case of panicking while panicking that can already happen with
Dropimpls. This causes the process to abort. This happening may be undesirable, but it’s not Undefined Behavior.But maybe it'd require
set_alloc_error_hookto beunsafe?Why?
Simon Sapin at 2020-05-16 17:27:04
For why this restriction exists, I don't know why either, but (for other people who may be reading this, I know you've already read it) the three messages starting from #51245 (comment) explain that it's indeed here :)
Because we mark global allocator functions as
nounwindto allow the compiler to optimize better.Amanieu d'Antras at 2020-05-16 17:35:59
@Amanieu do you mean functions like
std::alloc::alloc? Those return null on failure though. This is about unwinding fromstd::alloc::handle_alloc_error, which is already in an error path so optimizing it might be less important?Simon Sapin at 2020-05-16 17:51:36
But maybe it'd require set_alloc_error_hook to be unsafe?
Why?
This was written under the assumption that possibly some platforms would have unwinding happen outside of the scope of what rustc controls, which might end up doing wrong things if in OOM conditions. It's very possible that this assumption is wrong, though -- I would even hope it is, as it would mean that this
unsafewould not be deserved :)Léo Gaspard at 2020-05-16 21:38:55
It's a little hard to tell from the comments both here and on https://github.com/rust-lang/rust/issues/51245 what is happening and/or needs to happen to make progress on this. My company has a use case very similar to the one described (don't want OOM on one task to tear down the whole process), and we might be able to contribute to moving this forward, but would need some guidance on how to help do that. Could someone point me in the right direction?
jgallagher-cs at 2020-11-20 14:57:54
Looks like
panicking on OOM will never correctly handle all OOM cases, becausepanicitself can OOM. What about throwing an object that does not require allocation? Instead ofpanic!("oom")we can do
resume_unwind(Box::new(AllocError)); // AllocError is ZSTThis has the added benefit that the catcher can distinguish this from other panics (which usually are unrecoverable programming errors).
@alexcrichton @pnkfelix Would the Rust team be interested in reviewing a PR that does this? There are a few other places in libstd that do not allow catching an OOM error - I'm looking into these and so far all of them are fixable.
Xiang Fan at 2021-09-10 13:59:34
resume_unwinddoesn't exist for#![no_std]. In addition it doesn't actually print a panic message. For that you needpanic_anywhich doesn't exist for#![no_std]either.#![no_std]only allows panicking with an&strorcore::fmt::Arguments. It doesn't support arbitrary panic payloads.bjorn3 at 2021-09-10 14:30:37
resume_unwind doesn't exist for #![no_std]
I take it that
oom=panicis for users withstd? See rust-lang/rfcs#2116:- For users with unwinding, an oom=panic configuration is added to make global allocators panic on oom.
- For users without unwinding, a try_reserve() -> Result<(), CollectionAllocErr> method is added.
~~Global allocators and~~ unwinding are
std-only. If somedayno_stdgains unwinding support, we'll probably haveresume_unwindforno_stdas well.Xiang Fan at 2021-09-10 14:45:07
Global allocators are available in no_std+alloc scenarios, while
resume_unwindisstdonly I think. Disallowingoom=panicin no_std+alloc would be fine though - those users would likely want to use fallible APIs instead of panicking on oom.Robin Lambertz at 2021-09-10 16:42:19
if there isn't a way to deny that dependencies ever use infallible allocation then that's basically an unworkable situation.
Lokathor at 2021-09-10 16:46:19
Practically speaking,
-C oom=panicwould be implemented by having the compiler-generatedmainfunction callset_alloc_error_hookto change the hook to one that panics instead of aborting. The changes needed to allowalloc_error_handlerto unwind are in #88098 and #88700.no_stdprograms should instead define a customalloc_error_handlerwhich can unwind or abort as needed.Amanieu d'Antras at 2021-09-10 18:41:20
The issue I'm raising is
oom=panicwill never be oom-safe as long as panicking allocates. An allocation error handler simply should not allocate. To solve this, I thinkoom=unwindis a good replacement: the actual reason we wantedoom=panicis to unwind.Bikeshed: Should this be renamed to
alloc-error=unwindafter 2b789bd0570983e82533f9ed30c80312ac334694 changed everything oom to alloc_error?Xiang Fan at 2021-09-10 21:23:51
To solve this, I think oom=unwind is a good replacement: the actual reason we wanted oom=panic is to unwind.
That still requires allocating the exception object I think. Also
catch_unwindwhich actually catches the panic returns a boxed panic payload.The issue I'm raising is oom=panic will never be oom-safe as long as panicking allocates.
I believe this problem can be mitigated by using a bit of memory reserved just for allocations necessary during panicking. Maybe it could even be an eagerly initialized thread local variable.
bjorn3 at 2021-09-10 21:47:16
let's also remember that one allocation failure doesn't mean eternal allocation failure. If you try to allocate a billion byte array and fail, maybe it's still possible to allocate 20 bytes for an error to describe what went wrong.
Lokathor at 2021-09-10 21:49:50
catch_unwind which actually catches the panic returns a boxed panic payload.
A boxed ZST doesn't require allocation.
That still requires allocating the exception object I think.
I believe this problem can be mitigated by using a bit of memory reserved just for allocations necessary during panicking.
I agree. However, there's a distinction between reserving an exception object and reserving a panic payload. Exception objects are an implementation detail and it's easy to make them use reserved memory. Panic payload is already exposed via the
resume_unwindAPI which allows the user to catch the payload. The user can simply drop the payload and now you've lost your reserved memory :(Xiang Fan at 2021-09-10 21:59:29
let's also remember that one allocation failure doesn't mean eternal allocation failure. If you try to allocate a billion byte array and fail, maybe it's still possible to allocate 20 bytes for an error to describe what went wrong.
True. Still, is it really a good idea to rely on "small objects" being available? Most allocators don't request memory in increments of 20 bytes - they might request, say, a 64K block each time. Allocating 20 bytes can still become a 64K memory request and fail.
Xiang Fan at 2021-09-10 22:22:50
Panic payload is already exposed via the
resume_unwindAPI which allows the user to catch the payload. The user can simply drop the payload and now you've lost your reserved memoryIf the
AllocErroris a ZST and ifBoxof a ZST were implemented such that the inner pointer isNULLand allocators are required to implement freeing aNULLpointer as a no-op then there would be no issue, right?More importantly, the oom=panic feature will create memory safety hazards in code that currently doesn't have memory-safety hazards. In general I think we'll have to assume by default that crates are not compatible with oom=panic unless they explicitly claim otherwise, and I think the build system should enforce this. I expanded on this at https://github.com/rust-lang/rfcs/pull/2116#issuecomment-1006144661.
Brian Smith at 2022-01-05 23:03:58
if
Boxof a ZST were implemented such that the inner pointer isNULLI'm not sure what you mean here, but it is currently guaranteed that
Box<T>is a non-null pointer, no matter theT. This allowsOption<Box<T>>to be represented as a single pointer. From https://doc.rust-lang.org/stable/std/boxed/index.html#memory-layout:For zero-sized values, the
Boxpointer still has to be valid for reads and writes and sufficiently aligned.And the docs on pointer validity says:
- A null pointer is never valid, not even for accesses of size zero.
Robin Lambertz at 2022-01-06 00:35:43
Boxdoesn't actually allocate for zero-sized values - it creates a non-null dangling pointer:fn main() { let val = Box::new(()); println!("{:?}", Box::into_raw(val)); }prints
0x1A while back, I experimented with getting a complete zero-alloc panic to work (I used a custom
GlobalAllocthat enforced that no allocations happened after a certain point). I believe I got it working for ZST payloads - I can try to dig up the code.Aaron Hill at 2022-01-06 00:53:02
Tagging as needs-summary; it'd help to know the current state of this so we can understand what it would take to stabilize this.
Josh Triplett at 2022-02-09 18:20:58
It's not implemented yet. The implementation PR is #88098 and is waiting for review.
Amanieu d'Antras at 2022-02-09 18:42:10
@Amanieu what's the current status of this tracking issue? It looks like you implemented the feature in https://github.com/rust-lang/rust/pull/88098, is this just waiting on an eventual push for stabilization?
Jane Losare-Lusby at 2022-06-15 18:41:39
I think this just needs to be stabilized. We might need to audit the standard library to check that we stay sound in the face of unwinding from allocations. Also I'm not sure what the state of the documentation is.
Amanieu d'Antras at 2022-06-16 11:42:39
What are the chances of stabilizing this soon? I've written workaround for it #84612 a year ago, and I wonder if I should push for the workaround still.
Kornel at 2022-10-12 12:40:33
I don't think std is ready for oom=panic yet. The unwinding path is not allocation-free, so likely currently oom=panic only works when a large allocation fail (but the allocator can still satisfy a few small ones).
Gary Guo at 2022-10-12 20:29:54
that still seems like a strict improvement over the current situation.
Lokathor at 2022-10-12 20:56:55
@nbdd0121 As long as this behavior is safe (I assume it causes abort like a panic during a panic), I don't think this issue is a blocker to stabilization, because this deficiency doesn't change the public interface of
oom=panic. It's a quality-of-implementation issue, but even aborting sometimes is better than aborting every time.Kornel at 2022-10-12 21:30:08
Currently we panic with error message, and that will require allocation, and I consider that a publicly visible behaviour.
Gary Guo at 2022-10-12 21:36:40
If the error message is fixed, we should be able to pre-allocate the necessary object (though it might require some hacky code in the panic internals).
Aaron Hill at 2022-10-12 21:42:38
Even for fixed error message we need to allocate a
Box<dyn Any>for it, and since&stris not a ZST, there are actual heap allocation required. We couldn't preallocate it because we need to allocate one of them for each thread (not all threads are created by std APIs).C++ doesn't have this issue because it uses special
__cxa_allocate_exceptionand__cxa_free_exceptioninstead of normal heap allocation, which we can't do because our API is just a plainBox.We could make the panic payload a ZST, and that'll avoid allocation problem on unwind path entirely.
All other allocations that we currently have can be avoided, I think.
Gary Guo at 2022-10-12 21:53:02
We’re constrained to
Box<dyn Any>because it’s part ofstd::thread::Result, right?Simon Sapin at 2022-10-12 22:28:50
There are 2 levels of allocation when panicking, first a
Box<String>orBox<&str>must be allocated to hold the panic message. Then (with dwarf unwinding) an exception objects needs to be allocated on the heap, which holds theBox<Any>.Amanieu d'Antras at 2022-10-13 00:55:04
The exception object doesn't need to be allocated on the heap. It can be
#[thread_local].Gary Guo at 2022-10-13 00:57:06
@nbdd0121 What if the exception object is subsequently passed across a thread boundary, and the thread later terminates while the exception object is still present on some other thread stack or heap?
Ellen Emilia Anna Zscheile at 2022-10-13 07:37:09
Libunwind exception objects can't be sent to another thread. They are freed when the exception is caught an not rethrown.
However the process of unwinding could trigger additional memory allocations due to drop impls, which could cause another out-of-memory exception to be thrown. C++ handles this by only having one OOM exception object in TLS, if that one is already taken then the exception is allocated normally (with an abort if that allocation fails).
Amanieu d'Antras at 2022-10-13 11:55:31
wouldn't it make more sense to collapse multiple out-of-memory exceptions into one (or, like, have a simple counter for the occurrences, not that it matters much), and stop executing that drop impl (I'm not sure how that would affect invariants and such, but the unwinding would be there either way, so I don't think it would change much), are there downsides to that besides even more unpredictable control flow in OOM cases?
Ellen Emilia Anna Zscheile at 2022-10-13 19:36:28
and stop executing that drop impl
This breaks the pinning invariant for things on the stack that are pinned and as such is unsound.
bjorn3 at 2022-10-13 19:51:12