Abort on some large allocation requests, Panic on other
Compare these two:
let v = vec![0u8; !0];
let u = vec![0u16; !0];
We request a vector with 18446744073709551615 elements.
- For
u8we receive out of memory (null) from the allocator, and callalloc::oom, which aborts the program:application terminated abnormally with signal 4 (Illegal instruction) - For
u16, we get an assertion:thread '<main>' panicked at 'capacity overflow', ../src/libcore/option.rs:330
This is inconsistent. Why don't we abort on both of these cases? We abort on too large allocation requests, so those that are even larger could abort too?
@bluss
For the record, your code is the equivalent of
fn main() { let v = ::std::vec::from_elem(0u16, !0); }In the
u16case, the amount of memory that is supposed to be allocated is2*(2^64 - 1), which exceeds the size ofusize. That is checked for rather explicitly inVec::with_capacity. Maybe we should abort instead of panicking in that case too.Ariel Ben-Yehuda at 2015-07-11 12:03:20
Additional info: already
!0bytes is more than we can safely allow allocating (due to known fact #22104). We should treat that as capacity overflow, but decline to do so, because it will abort anyway.bluss at 2015-07-11 12:37:17
/cc @rust-lang/compiler , I think?
Steve Klabnik at 2015-07-16 18:49:12
This is a @rust-lang/libs problem.
I am of the opinion that aborts should only occur on problems that unwinding can't address. That is, if you overflow the capacity we have to check for that, and we can panic right away. Everything will be cleaned up and the world will be happy.
However if an allocation makes it past our checks and then fails in the allocator, we have historically conservatively assumed that you are legitimately OOM and abort; unwinding can lead to arbitrary other allocations which is presumably bad to do during OOM.
In practice, the platforms we support basically can't OOM naturally (by which I mean, you're asking for the last few bits of free memory), so I reckon if you ever managed to trigger an OOM it's because you managed to request All The Memory, in which case there's plenty of slack space for unwinding. If it's a legit OOM then we'll double panic and crash anyway.
However https://github.com/rust-lang/rust/issues/14674 mentions that aborting on OOM enables LLVM to better optimize around allocations. I don't know the details.
Edit: Regardless, I do not want to add any kind of "heuristic checks" for "will probably OOM" or "was probably a real OOM".
Aria Desires at 2015-07-16 20:20:54
I think that, ultimately, abort vs. unwind behaviour should be left up to the implementation, but the standard library should abort on failed allocations. This particular example is just a red herring. It's only because
u8is a single byte that it doesn't trigger any kind of arithmetic overflow behaviour. It's an interesting quirk, but doesn't really reflect any real inconsistent behaviour.James Miller at 2015-07-17 01:50:28
Also to be completely clear to those not super familiar with the issue: the "odd" behaviour described in this issue is completely by design, and not a mistake. Rather this issue is positing that the behaviour should be changed to be more consistent.
Aria Desires at 2015-07-17 07:10:30
Of course. I don't think it's completely by design, it's patched up to be this way. For example, if we know a capacity of
> isize::MAXis always too large, why does that abort and not panic?bluss at 2015-07-24 21:33:53
@bluss any time we detect this, I believe we panic. Only
alloc(..) == nullaborts to my knowledge: https://github.com/rust-lang/rust/blob/master/src/liballoc/raw_vec.rsWe simply don't even check in some cases where we know any degeneracies will be caught by the allocator (e.g. when doubling capacity on 64-bit).
Aria Desires at 2015-07-24 23:28:36
Note that checking for
> isize::MAXon 64-bit is almost surely useless since your system will OOM you far before that point using normal growth. If you explicit ask for> isize::MAXthen you're basically arbitrarily "lucky". It's almost inconsistent to check, in that regard.Aria Desires at 2015-07-24 23:31:48
checking for > isize::MAX on 64-bit is almost surely useless since your system will OOM you far before that point using normal growth.
That’s what they said about 32 bits timestamps and addressing too.
Simonas Kazlauskas at 2015-07-24 23:47:00
@nagisa Today this is simply a hard hardware issue: you only get 48 bits of the address space.
Aria Desires at 2015-07-24 23:51:38
Yes, but it is always useful to be aware of future prospects.
Simonas Kazlauskas at 2015-07-24 23:53:06
@nagisa I have thankfully architected the current design to make such an upgrade trivial. In fact, all you have to do is delete some code!
https://github.com/rust-lang/rust/blob/master/src/liballoc/raw_vec.rs#L445-L453
Aria Desires at 2015-07-24 23:54:32
So is this not a bug?
Steve Klabnik at 2015-08-04 21:54:10
It's not an accident, at least.
On Tue, Aug 4, 2015 at 2:54 PM, Steve Klabnik notifications@github.com wrote:
So is this not a bug?
— Reply to this email directly or view it on GitHub https://github.com/rust-lang/rust/issues/26951#issuecomment-127773130.
Aria Desires at 2015-08-04 22:17:40
I think all conversations I've had about this have said that performance improvement from aborting instead of panic are unlikely; it saves code size in the binary with likely very little to no runtime impact. We don't have numbers.
So I reported this because I'd like to use
abort()for both of the "oom" cases mentioned in the issue's description. It is sensible to do after we get a message on abort working (this is a bug).bluss at 2015-08-04 22:36:52
I think the @rust-lang/libs team just needs to make a final call on this. I think @bluss and I have both made our stances fairly clear, but to summarize (hopefully I'm not misrepresenting bluss):
Always Abort:
- More consistent (always aborts when too much memory requested)
- Potential theoretical perf gains to be had (llvm can optimize around straight-up ending the program)
Sometimes Abort:
- Does "the most friendly thing we can" without incurring unnecessarily overhead (panics can recover -- aborts can't)
Never Abort (wild card!):
- There is no such thing as a "true" oom. You will be killed by your OS first.
- Destructors unlikely to allocate and will just trigger a double panic => abort if they do.
Aria Desires at 2015-08-04 23:01:49
Code size may have an impact (from unrelated Servo discussion)
1:
if it takes even 5% off of our code size, that could have some significant impact on Servo's performance, particularly on embedded hardware with relatively small i-cache sizes.
bluss at 2015-08-05 00:34:17
triage: P-medium
As an action item the libs team has decided to see if it's possible to panic on OOM and we can possibly reevaluate from there.
Alex Crichton at 2015-08-05 20:49:54
Would be interesting to check is a singleton pointer value is enough to unwind without allocating. The catching mechanism would
boxan "OOM happened" ZST (which doesn't actually allocate) asBox<Any>for that specific value, instead of transmuting the pointer toBox<Box<Any>>.In the case of true OOM, it's possible that secondary allocations during unwinding (which should be rare, I can't recall ever seeing an allocating destructor) could succeed due to deallocations earlier in the unwinding process.
Eduard-Mihai Burtescu at 2015-08-06 06:41:43
@eddyb I think our main concern is libunwind allocating. A quick look seems to reveal they never check their mallocs. However maybe there's some macro shenanigans or maybe this is all utility code that isn't called by us. I didn't really dig deep into it (also maybe I checked out the wrong version...).
Aria Desires at 2015-08-06 15:15:37
In particular if Rust allocates during OOM that should be fine -- we always check and a double panic is just an abort.
Aria Desires at 2015-08-06 15:16:16
Java pre-allocates an OutOfMemory exception at initialization to ensure it can properly throw when out of memory - it might make sense to do the same thing here if possible.
Steven Fackler at 2015-08-06 16:06:41
IIRC Python does something similar as well.
Sent from my iPhone
On Aug 6, 2015, at 12:06, Steven Fackler notifications@github.com wrote:
Java pre-allocates an OutOfMemory exception at initialization to ensure it can properly throw when out of memory - it might make sense to do the same thing here if possible.
― Reply to this email directly or view it on GitHub.
Steve Klabnik at 2015-08-06 16:18:12
Preallocating might be difficult given that Rust code might not be called from inside a Rust main. I guess we could use pre-allocation for when it is called from a Rust program, and then fallback to something else for the called by foreign code case.
James Miller at 2015-08-07 02:00:20
cc me
Felix S Klock II at 2015-08-07 05:15:54
@Aatch the problem of preallocation seems pointless, since we control what the pointer value means, and if it points to some
staticinliballoc, then we can treat it as an OOM panic instead of aBox<Box<Any+Send>>value.Eduard-Mihai Burtescu at 2015-08-07 09:01:02
Note that this is a memory safety issue for a few types: Box::new can't panic today -- this would introduce more exception-safety nonsense.
Aria Desires at 2015-08-07 18:51:52
I don't really have an opinion on how the problem is solved, I would just recommend fixing the illegal instruction error soonest. I had bad code that was incorrectly calculating a vector length that only kicked in when I tried to run the release mode without arguments. Having the program crash repeatedly with "Illegal Instruction" had me incorrectly assuming the rust compiler was fubar and generating bad code.
Least I'm assuming it's the same issue with Rust 1.2 on a 64 bit machine calling something like: Vec::with_capacity(33333333333333)
pixel27 at 2015-08-19 01:21:50
Please don't panic on OOM, especially now that catch_panic is safe.
Joshua Yanovski at 2015-09-17 15:50:59
@pixel27 I imagine a good fix would be a better way to report the issue. Maybe an output message before the abort (which I believe is an issue we have filed).
bluss at 2015-09-17 16:26:07
@pythonesque Why? If it's not possible to deal with OOM in Rust, I can't use Rust. See https://internals.rust-lang.org/t/could-we-support-unwinding-from-oom-at-least-for-collections/3673/21 for an exploration of why handling OOM is important.
Lilith River at 2016-08-28 23:38:24
@nathanaeljones IMO the real problem here is that we don't have an allocator abstraction in
libstdyet.Eduard-Mihai Burtescu at 2016-08-28 23:54:21
@nathanaeljones the default, std allocator aborts, but given Rust's low-level nature, you could replace it with one that doesn't. It is a lot more work right now, but as @eddyb said, this is something that will come along in time.
Steve Klabnik at 2016-08-29 00:04:06
@steveklabnik Thanks; I'm not too concerned with 'lots of work', but rather if I can produce well-behaved binaries for both FFI and server use.
I started porting Imageflow over to Rust in June, but failed to verify that OOM panics (as widely reported) instead of aborting. I'd assumed that the stabilization of
std::panic::catch_unwindmeant Rust was ready for my use case. I decided to write a test today and, well, panicked.It's very unclear to me where the anti-OOM-panic sentiment arises from. Double panics abort.
I've been trying to track down blockers around this, but haven't been able to discover the reasons why movement on graceful OOM handling has been slow. I think I ?might? have found the right issue to discuss: https://github.com/rust-lang/rust/issues/27700#issuecomment-243009177
Lilith River at 2016-08-29 00:41:42
Triage: still blocked on a libs team decision.
Steve Klabnik at 2019-12-25 16:07:51
Looks like there is now an unstable
alloc_error_handlerfeature gate. Tracking issue: https://github.com/rust-lang/rust/issues/51540Mentioned in that issue, there is also an accepted RFC to set a cargo attribute for
oom=panic, tracking issue: https://github.com/rust-lang/rust/issues/43596And another issue to make the default for
handle_alloc_errorpanic, which has survived the FCP with disposition to merge: https://github.com/rust-lang/rust/issues/66741Ehsanul Hoque at 2020-05-11 02:18:30
We discussed this in the Libs API meeting today: the consensus was that we should pipe "invalid layout" errors through to
alloc_error_handlerby making it take aenumwith eitherAllocErrandLayoutErr. This would be similar to the existing unstableTryReserveErrortype, and in fact we could reuse the same type in both places.Amanieu d'Antras at 2021-09-22 21:35:03