Compilation of a crate using a large static map fails on latest i686-pc-windows-gnu Beta
I'm trying to build a cdylib (https://github.com/urschrei/lonlat_bng) which requires the https://github.com/urschrei/ostn15_phf crate.
building on AppVeyor, on i686-pc-windows-gnu, using the latest beta, is failing with an OOM error:
Details
The ostn15_phf crate is essentially just a very big static map, built using PHF (the generated, uncompiled map is around 42.9mb)
The build passed when running cargo test, using rustc 1.12.0-beta.3 (341bfe43c 2016-09-16):
https://ci.appveyor.com/project/urschrei/lonlat-bng/build/105/job/3y1llt6luqs3phs3
It's now failing when running cargo test, using rustc 1.12.0-beta.6 (d3eb33ef8 2016-09-23):
https://ci.appveyor.com/project/urschrei/lonlat-bng/build/job/27pgrkx2cnn2gw50
The failure occurs when compiling ostn15_phf with fatal runtime error: out of memory
cc @brson just want to make sure you're aware of this.
Steven Fackler at 2016-09-28 21:32:44
Related to #36926 perhaps? ("1.12.0: High memory usage when linking in release mode with debug info")
Niko Matsakis at 2016-10-04 09:38:01
@urschrei have you tried this on platforms other than windows, out of curiosity?
Niko Matsakis at 2016-10-06 13:33:24
I see. It seems to work on
x86_64-pc-windows-gnu, but noti686.Niko Matsakis at 2016-10-06 13:36:14
I haven't tried to build i686 on Linux or OSX, but I easily could…
Stephan Hügel at 2016-10-06 13:45:13
Well, I just did a run on my linux box. The memory usage is certainly through the roof: https://gist.github.com/nikomatsakis/ea771dd69f12ebc5d3d5848fa59fb43a
This is using nightly (
rustc 1.13.0-nightly (a059cb2f3 2016-09-27)). The peak is around 4GB.Niko Matsakis at 2016-10-06 13:48:47
So @alexcrichton has executed a massif run: https://gist.github.com/alexcrichton/d20d685dd7475b1801a2ccac6ba15b08
~~The peak result~~ Measurement 58 shows something like this:
| Percentage | Area | | --- | --- | | 36% | HIR | | 9% | MIR | | 14% | type/region arenas | | 5% | region maps | | 2% | constants |
The peak result (measurement 48) looks pretty similar. More memory used by MIR:
| Percentage | Area | | --- | --- | | 26% | HIR | | 14% | MIR | | 5% | type/region arenas | | 4% | region maps | | 3% | constants |
These numbers are just based on a kind of quick scan of the gist.
Niko Matsakis at 2016-10-06 17:10:36
It seems clear we need to revisit our handling of statics and constants in a big way. But then this has been clear for a long time. =) I'm wondering if we can find some kind of "quick fix" here.
I also haven't compared with historical records -- but most of that memory we see above, we would have been allocating before too, so I'm guessing this is a case of being pushed just over the threshold on i686, versus a sudden spike.
Niko Matsakis at 2016-10-06 17:16:02
Sizes of some types from MIR (gathered from play):
statement: 192 statement-kind: 176 lvalue: 16 rvalue: 152 local-decl: 48Niko Matsakis at 2016-10-07 17:15:34
I am feeling torn here. It seems the best we can do short-term is to make some small changes to MIR/HIR and try to bring the peak down below 4GB. The long term fix would be to revisit the overall representation particularly around constants and see what we can do to bring the size down. One thing I was wondering about (which is probably answerable from @alexcrichton's measurements) is what percentage of memory is being used just in the side-tables vs the HIR itself.
In any case, it seems like 152 bytes for a mir rvalue is really quite big.
Niko Matsakis at 2016-10-12 19:42:37
Looking again at the massif results, it looks like MIR is taking more memory than I initially thought. One place that seems to use quite a bit is the list of scopes.
Niko Matsakis at 2016-10-12 19:48:53
Better numbers:
| percent | pass | | --- | --- | | 35.09% | HIR | | 34.96% | MIR | | 11.77% | mk_region, node_type_insert | | 6.24% | region_maps | | 2.42% | consts | | 1.39% | spans |
Niko Matsakis at 2016-10-12 20:19:17
Just pinging @nikomatsakis to keep this P-high bug on his radar.
Brian Anderson at 2016-10-20 16:12:26
I've made basically no progress here, I'm afraid. I think the most likely path forward in short term is to try and reduce memory usage in various data structures. Not very satisfying though. I'm not sure if we can make up the gap that way.
Niko Matsakis at 2016-10-20 17:49:33
Discussed in compiler meeting. Conclusion: miri would be the proper fix, but maybe we can shrink MIR a bit in the short term, perhaps enough to push us over the edge. I personally probably don't have time for this just now (have some other regr to examine). Hence re-assigning to @pnkfelix -- @arielb1 maybe also interested in doing something?
Niko Matsakis at 2016-10-20 20:23:34
cc @nnethercote in case he might have input on ways to attack this
Felix S Klock II at 2016-10-27 20:16:52
Massif is slow and the output isn't the easy to read, but it's the ideal tool for tackling this. The peak snapshot in @alexcrichton's data is number 48, and I've made a nicer, cut-down version of it here: https://gist.github.com/nnethercote/935db34ff2da854df8a69fa28c978497
You can see that ~30% of the memory usage comes from
lower_expr. I did a DHAT run (more on that in a moment) and this is the main culprit:ExprKind::Tup(ref elts) => { hir::ExprTup(elts.iter().map(|x| self.lower_expr(x)).collect()) }push_scope/pop_scopeaccount for another ~15%.Nicholas Nethercote at 2016-10-28 04:20:19
@nnethercote
The
push_scope/pop_scopeis MIR things. I think interning lvalues (they take 16 bytes, but occur multiple times per statement/terminator - they should take 4) and boxing rvalues (many statements are storage statements that don't have an rvalue) should claw back some performance.Also,
enum Operandis very common, in most contexts just an lvalue, and takes 72 bytes. Something should be done about this, even just boxing constants (which would reduce it to 24 bytes, or 16 bytes with interning lvalues, or 8 bytes with interning lvalues and bitpacking).Ariel Ben-Yehuda at 2016-10-28 05:50:16
Having a 32-bit MIR "shared object representation"
MirValuewith 3/4 tag bits and the rest an index to an interner would basically apply all of these suggestions (whereMirValuecan represent all of Constant, Lvalue, Rvalue, Operand, and we keep the structs as strongly-typed newtypes).We can then intern
MirValues and/or occasionally GC-intern them, for further memory gain.Ariel Ben-Yehuda at 2016-10-28 06:24:13
push_scope/pop_scope account for another ~15%.
Builder::scope_auxiliaryends up with 5.8M elements. It's only used when MIR is dumped. I tried removing it and the peak RSS dropped by 6.7% from 4.86 GB to 4.53 GB. It seems like it should be straightforward to make thescope_auxiliarypushes occur only when MIR dumping is enabled.Nicholas Nethercote at 2016-10-28 06:37:18
@nnethercote
Well, scopes are supposed to eventually be used for debuginfo, so we don't want to remove them (BTW, the
Vecwould be 33% smaller ifVisibilityScopewas a nonzero).BTW, how do you do your benchmarks?
Ariel Ben-Yehuda at 2016-10-28 07:59:34
Massif is slow and the output isn't the easy to read
I was reminded of the fact that
massif-visualizerexists on Linux. It's a nice graphical viewer for Massif output files. Much better than the textual output fromms_print.Nicholas Nethercote at 2016-10-31 02:09:49
BTW, how do you do your benchmarks?
With Massif and DHAT, both of which are Valgrind tools. https://gist.github.com/nnethercote/10ec6918e02029e7609c9bb8d594b986 shows how I invoke them on the rustc-benchmarks.
Nicholas Nethercote at 2016-10-31 02:11:46
@arielb1
Well, scopes are supposed to eventually be used for debuginfo, so we don't want to remove them (BTW, the Vec would be 33% smaller if VisibilityScope was a nonzero).
Do we need the full detail for debuginfo?
Niko Matsakis at 2016-11-01 18:02:40
Do we need the full detail for debuginfo?
Good question.
scope_auxiliarycurrently costs a 7% peak memory penalty for this workload, for the benefit of (a) pretty-printing, and (b) something involving debuginfo in the future. That's not compelling, IMHO, and I think we should definitely consider how to avoid it.Nicholas Nethercote at 2016-11-03 03:12:26
Hmm, @eddyb thinks there should not be visibility scopes without variables. That's a bit confusing. @nnethercote, were you analyzing @alexcrichton's data? @alexcrichton, do you remember if you analyzed nightly or some earlier thing?
Niko Matsakis at 2016-11-03 20:19:44
So the
push_scopeis actually temporary values used to store the current scope during construction. I'm a bit surprised that it gets so large for constants, but it may make sense. Still, not really related to visibility scopes, and that memory should be freed after construction. This case is sort of "worst-case" because it is one gigantic function.Niko Matsakis at 2016-11-03 20:25:03
triage: P-medium
After lengthy discussion in the @rust-lang/compiler team mtg, we decided to demote this to medium. It'd be great to make some improvements to MIR representation, but we think it's unlikely we'll get enough to push this back under 4GB, and that the use case here is not representative (that is, MIR memory usage is not a big problem in general).
The right fix for this particular code is definitely changing how we handle constants. Hopefully we can pursue that more aggressively. But we'd never backport such a thing.
(Personally, I think it's still worth experimenting to see if we can lower MIR usage through various simple changes of the kind that have been discussed here.)
Niko Matsakis at 2016-11-03 20:32:49
@nnethercote, were you analyzing @alexcrichton's data?
I have looked at that data and also done a bunch of Massif and DHAT runs myself, which showed much the same results.
Nicholas Nethercote at 2016-11-03 21:18:11
It'd be great to make some improvements to MIR representation
HIR is a part of the problem too,
lower_exprin particular. That memory lives as long as the MIR does. Is there a possibility of the HIR being freed earlier? Or is this another manifestation of the "one giant function" problem?Another note: most of the file consists of an array of elements like this:
(694236, (96.220, -52.874, 53.638)),The nested tuple increases memory usage. If it was a flat 4-tuple that would reduce memory usage a decent amount. Four separate arrays would probably be even better. Having to rewrite the code to work around compiler limitations is far from ideal, but this info might be useful to @urschrei right now.
Nicholas Nethercote at 2016-11-03 21:26:13
@alexcrichton, do you remember if you analyzed nightly or some earlier thing?
Unfortunately I've since forgotten, but I remember having to recompile rustc against the system allocator and it was likely this compiler or somewhere around there.
Alex Crichton at 2016-11-03 21:44:50
That memory lives as long as the MIR does. Is there a possibility of the HIR being freed earlier? Or is this another manifestation of the "one giant function" problem?
This is the "one giant function" problem. We could try to be "Javascript-like" and parse HIR lazily when doing MIR lowering, but that's a fair bit of complexity.
Ariel Ben-Yehuda at 2016-11-03 22:24:20
I've implemented some visitors that collect data on the AST and HIR: https://github.com/rust-lang/rust/pull/37583 Their output for this crate looks like shown below.
tl;dr: Reducing the size of
ast::Expr(similar to what has already happened forhir::Expr) is probably a good investment.PRE EXPANSION AST STATS Name Accumulated Size Count Item Size ---------------------------------------------------------------- Mod 40 1 40 PathListItem 72 2 36 Local 96 2 48 Arm 128 2 64 Mac 240 3 80 FnDecl 432 9 48 StructField 440 5 88 Block 480 10 48 Stmt 480 12 40 ImplItem 896 4 224 Pat 1_344 12 112 Attribute 3_408 71 48 Item 4_352 17 256 PathSegment 5_976 83 72 Ty 6_496 58 112 Expr 9_728 64 152 ---------------------------------------------------------------- Total 34_608 POST EXPANSION AST STATS Name Accumulated Size Count Item Size ---------------------------------------------------------------- Mod 40 1 40 Local 48 1 48 PathListItem 72 2 36 Arm 128 2 64 FnDecl 336 7 48 Stmt 360 9 40 Block 384 8 48 StructField 440 5 88 ImplItem 896 4 224 Pat 1_232 11 112 Attribute 3_408 71 48 Item 4_352 17 256 PathSegment 7_128 99 72 Ty 7_168 64 112 Expr 1_013_064_952 6_664_901 152 ---------------------------------------------------------------- Total 1_013_090_944 HIR STATS Name Accumulated Size Count Item Size ---------------------------------------------------------------- Mod 32 1 32 Decl 32 1 32 Stmt 40 1 40 Local 48 1 48 PathListItem 56 2 28 Arm 96 2 48 FnDecl 280 7 40 StructField 360 5 72 Block 384 8 48 ImplItem 704 4 176 Pat 1_056 11 96 Path 2_880 90 32 Attribute 3_408 71 48 Item 3_672 17 216 Ty 5_120 64 80 PathSegment 6_400 100 64 Expr 639_830_400 6_664_900 96 ---------------------------------------------------------------- Total 639_854_968Michael Woerister at 2016-11-04 15:49:56
tl;dr: Reducing the size of ast::Expr (similar to what has already happened for hir::Expr) is probably a good investment.
These are interesting measurements, but I disagree with this conclusion. Let me explain why. Here is a screenshot of massif-visualizer's output for this program.

There is an early peak around the 180,000 ms mark which is dominated by the AST. However, that memory is soon reclaimed -- note how the green, yellow, and orange segments disapper. (That these segments correspond to AST memory is hard to tell from the screenshot, but is clear when I interact with massif-visualizer.)
But the global peak is much later, around the 1e+06 ms mark. At this point memory usage is dominated by HIR, MIR and Contexts/Arenas. Reducing the AST size won't reduce the global peak.
Massif really is the ideal tool for finding and reducing peak memory usage. It's exactly what it was designed for.
Nicholas Nethercote at 2016-11-04 21:23:31
These are interesting measurements, but I disagree with this conclusion.
That's why I wrote that it's "probably a good investment" instead of "this will solve the problem"
:)But you are right of course, it won't help with the later memory spike.This massif-visualizer output looks pretty neat! I'll have to give it a try.
Michael Woerister at 2016-11-07 16:06:11
Pinging @michaelwoerister @arielb1 @nikomatsakis what's the status of this? Can we get it fixed for good any time soon?
Brian Anderson at 2016-11-17 17:09:45
Did #37764 maybe fix this? It should have an impact on peak memory usage here, right?
Michael Woerister at 2016-11-17 17:48:58
Yep, looks like it did: https://ci.appveyor.com/project/urschrei/lonlat-bng/build/119
steph
On 17 Nov 2016, at 17:49, Michael Woerister notifications@github.com wrote:
Did #37764 maybe fix this? It should have an impact on peak memory usage here, right?
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or mute the thread.
Stephan Hügel at 2016-11-17 18:20:13
OK, I'm going to close this issue then, though we still obviously can do a lot to reduce memory usage for big static constants (and maybe MIR in general).
Niko Matsakis at 2016-11-17 20:57:15
Since this issue was filed, both #37445 and #37764 landed. Together they reduced peak RSS by roughly 20%. Not sure if that's enough of an improvement to avoid the OOM @urschrei was seeing.
Taking a slightly longer view, here's a comparison of 1.11 vs a recent trunk, doing debug builds, measured with /usr/bin/time on Linux:
1.11:
104.13user 1.88system 1:45.51elapsed 100%CPU (0avgtext+0avgdata 8619484maxresident)k 101768inputs+305992outputs (309major+1016440minor)pagefaults 0swapsTrunk:
68.28user 0.96system 1:08.98elapsed 100%CPU (0avgtext+0avgdata 3887928maxresident)k 0inputs+330936outputs (0major+522179minor)pagefaults 0swapsSo peak memory is 2.21x better and compile time is 1.53x better.
Nicholas Nethercote at 2016-11-17 22:05:35
@nnethercote the latest nightly builds on i686 without OOM!
steph
On 17 Nov 2016, at 22:05, Nicholas Nethercote notifications@github.com wrote:
Since this issue was filed, both #37445 and #37764 landed. Together they reduced peak RSS by roughly 20%. Not sure if that's enough of an improvement to avoid the OOM @urschrei was seeing.
Taking a slightly longer view, here's a comparison of 1.11 vs a recent trunk, doing debug builds, measured with /usr/bin/time on Linux:
1.11:
104.13user 1.88system 1:45.51elapsed 100%CPU (0avgtext+0avgdata 8619484maxresident)k 101768inputs+305992outputs (309major+1016440minor)pagefaults 0swaps Trunk:
68.28user 0.96system 1:08.98elapsed 100%CPU (0avgtext+0avgdata 3887928maxresident)k 0inputs+330936outputs (0major+522179minor)pagefaults 0swaps So peak memory is 2.21x better and compile time is 1.53x better.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or mute the thread.
Stephan Hügel at 2016-11-17 22:28:00
An interesting observation gleaned via
-Z print-type-sizesin #37770 :mir::TerminatorKind(256 bytes) has some low hanging fruit, in particular I suspect theAssertvariant is not common and thus could have some of its structure boxed.print-type-size type: `mir::TerminatorKind` overall bytes: 256 align: 8 print-type-size discrimimant bytes: 4 print-type-size variant Goto exact bytes: 4 print-type-size field .target bytes: 4 print-type-size variant If exact bytes: 84 print-type-size padding bytes: 4 print-type-size field .cond bytes: 72 align: 8 print-type-size field .targets bytes: 8 print-type-size variant Switch exact bytes: 52 print-type-size padding bytes: 4 print-type-size field .discr bytes: 16 align: 8 print-type-size field .adt_def bytes: 8 print-type-size field .targets bytes: 24 print-type-size variant SwitchInt exact bytes: 76 print-type-size padding bytes: 4 print-type-size field .discr bytes: 16 align: 8 print-type-size field .switch_ty bytes: 8 print-type-size field .values bytes: 24 print-type-size field .targets bytes: 24 print-type-size variant Resume exact bytes: 0 print-type-size variant Return exact bytes: 0 print-type-size variant Unreachable exact bytes: 0 print-type-size variant Drop exact bytes: 32 print-type-size padding bytes: 4 print-type-size field .location bytes: 16 align: 8 print-type-size field .target bytes: 4 print-type-size field .unwind bytes: 8 print-type-size variant DropAndReplace exact bytes: 104 print-type-size padding bytes: 4 print-type-size field .location bytes: 16 align: 8 print-type-size field .value bytes: 72 print-type-size field .target bytes: 4 print-type-size field .unwind bytes: 8 print-type-size variant Call exact bytes: 140 print-type-size padding bytes: 4 print-type-size field .func bytes: 72 align: 8 print-type-size field .args bytes: 24 print-type-size field .destination bytes: 32 print-type-size field .cleanup bytes: 8 print-type-size variant Assert exact bytes: 248 print-type-size padding bytes: 4 print-type-size field .cond bytes: 72 align: 8 print-type-size field .expected bytes: 1 print-type-size padding bytes: 7 print-type-size field .msg bytes: 152 align: 8 print-type-size field .target bytes: 4 print-type-size field .cleanup bytes: 8 print-type-size end padding bytes: 4Felix S Klock II at 2016-11-18 22:39:38
Operandis 72 bytes?! Shrinking that alone might have a bigger impact than anything we'd do to terminators (which are just one per BB).Eduard-Mihai Burtescu at 2016-11-19 01:59:16
I too have seen
Operandcome up multiple times while looking at this, and I agree that shrinking it could be a decent win. #37770 will help understanding why it's so big.Nicholas Nethercote at 2016-11-19 02:29:43
Probably due to directly embedding const values.
Eduard-Mihai Burtescu at 2016-11-19 02:35:32
Tagging @brson @michaelwoerister @arielb1 @eddyb, because the OOM error has begun to re-occur on the following platforms:
Linux x86: Nightly and Beta Windows x86 / i686: Nightly (not currently testing on Beta)
On Mac OS x86, stable, beta, and nightly currently pass. On Linux, stable passes On Windows x86, stable passes
Also to note that since
ostn15_phfis too big for crates.io, a crater / cargobomb pass will never check it…Stephan Hügel at 2017-04-26 14:54:48
can you bisect it?
Ariel Ben-Yehuda at 2017-04-27 19:17:39
Possibly related to https://github.com/rust-lang/rust/issues/40355. I'll see if I can find a working previous nightly tomorrow.
Stephan Hügel at 2017-04-27 23:24:06
Thanks for reopening @urschrei. Vexing problem.
Brian Anderson at 2017-05-04 16:06:33
Seems like we didn't take any action to reduce the sizes of various MIR constructs, is that correct?
Niko Matsakis at 2017-05-04 16:10:04
We discussed in the meeting. It's frustrating that peak memory usage creeped up here. We are considering the "proper fix", which would basically detect the fact that this is a massive constant and try to create a byte-array instead of MIR instructions. There is some debate as to the best way to do this (something relatively special-cased? or can we "execute with miri as we go"?). We'd also need to extend mir with the concept of an allocation and so forth so we can represent that.
In the meantime, I'm going to mark as P-high and assign to myself with the hope that I can find some low-hanging fruit in the MIR definition to prune memory usage. I feel like we've not picked all that fruit yet.
Niko Matsakis at 2017-05-11 20:51:31
triage: P-high
Niko Matsakis at 2017-05-11 20:51:36
I think what I'd do is:
- identify HIR subexpressions that we may turn into a byte buffer (very limited subset)
- use HAIR to turn those subexpressions, one leaf at a time, into miri values
- write the leaves into a miri allocation in their appropriate locations
Eduard-Mihai Burtescu at 2017-05-11 21:03:36
@eddyb that sounds like what I had in mind as well
Niko Matsakis at 2017-05-12 11:02:13
So I did some digging into the MIR representation. There is definitely some inefficiency here, but I think ultimately the best thing we could do in this case is to remove (from statics) the
StorageLiveandStorageDeadannotations. Something like 60% of the MIR in this case is StorageLive/StorageDead statements, and those are totally meaningless in a static constant anyhow!Niko Matsakis at 2017-05-15 23:21:12
Looks like we have some code to suppress storage-live/storage-dead in constants, but it was broken by the code that fixed https://github.com/rust-lang/rust/issues/38669 (I can't find the PR for that). It was relying on the "temp lifetime" which is no longer always the temp-lifetime. I think I'll add a more direct fix.
Niko Matsakis at 2017-05-15 23:52:13
Recently a change to reduce the size of some MIR structures has landed. Would be good to re-check.
Simonas Kazlauskas at 2017-05-18 16:57:30
@urschrei I believe that now two PRs have landed that (together) should dramatically reduce peak memory usage (#42023, #41926). Would you be so kind as to check whether things are working again once the next nightly is issued?
Niko Matsakis at 2017-05-23 12:56:50
@nikomatsakis Yep will do. Windows i686 is the last holdout; the current x86_64 Nightly is passing on macOS, Linux, and Windows.
Stephan Hügel at 2017-05-23 13:08:57
Still failing on Windows i686 using Nightly 5b13bff5203c1bdc6ac6dc87f69b5359a9503078.
Stephan Hügel at 2017-05-24 00:50:09
@urschrei bah humbug. Thanks!
Niko Matsakis at 2017-05-25 15:37:09
Next step: we should do some measurements (ideally with valgrind) of building this crate in various environments to try and see how much progress we've made, and how much we've got to go.
Niko Matsakis at 2017-05-25 20:14:29
OK so I attempted to run with massif but that failed miserably (ran for hours without terminating). I have deleted like 75% of the lines in the file and I'm giving it another try.
Niko Matsakis at 2017-05-26 19:03:31
@nikomatsakis So I did manage to run a massif run myself (same setup as yours) - I emailed the xz-ed massif report to you, it's pretty small.
I see multiple gigabytes of peak being used in borrowck of all things, in the nodemap returned in this place: https://github.com/rust-lang/rust/blob/master/src/librustc/middle/dataflow.rs#L180.
Ariel Ben-Yehuda at 2017-05-31 23:48:42
For borrowck, the only remaining improvement I can think of is instead of keeping dataflow information per-node, linear runs of nodes could be treated as "basic blocks".
Then again, there are no variables, so perhaps the CFG representation and/or
nodeid_to_indexcan be optimized to avoid that specific allocation, or it could be built lazily, when there are variables.Eduard-Mihai Burtescu at 2017-06-01 16:27:23
@eddyb
Maybe we can use some sort of O(n) optimization there? For example, a single sorted
(NodeId, CFGIndex)array (what's the typical ratio of node ids to cfg indexes?)Ariel Ben-Yehuda at 2017-06-01 17:11:15
@arielb1 I don't know, but for this exact case maybe you can just make the compiler print it?
Eduard-Mihai Burtescu at 2017-06-01 17:22:07
This does seem like some low-hanging fruit here. That table is using >21% of the memory, if I am reading the massif output correctly.
Niko Matsakis at 2017-06-08 17:47:31
I sort of like the idea of building it in a lazy fashion, even though it's not a very general purpose fix.
Niko Matsakis at 2017-06-08 17:48:14
I'll investigate that. Should be fairly easy, I would think. I can also determine the density here.
Niko Matsakis at 2017-06-08 17:49:51
I tried the lazy construction thing, but it doesn't seem to be working. In particular, I think we still wind up needed it, even for the big constant.
Niko Matsakis at 2017-06-13 20:17:33
r? @arielb1 -- due to not having much time, shifting over to @arielb1, who is planning on trying some clever hacks to skip borrowck for these huge constants for now
Niko Matsakis at 2017-07-27 20:41:08
@arielb1, can you tell us how https://github.com/rust-lang/rust/pull/43547 affected memory usage here? It seems to help quite a bit, right?
Michael Woerister at 2017-08-03 20:51:25
@urschrei
We improved memory usage on your case. Can you try to run CI again on a new nightly?
Ariel Ben-Yehuda at 2017-08-07 13:21:56
It's still failing on Windows i686 using nightly ba1d065ff: https://ci.appveyor.com/project/urschrei/lonlat-bng/build/job/utg8w0i12vmytkvi
Stephan Hügel at 2017-08-07 14:33:35
status: the original issue was fixed but memory usage crept up. I'll try to un-creep it back.
Ariel Ben-Yehuda at 2017-08-10 20:27:07
Update from compiler team mtg: No progress, I think.
Niko Matsakis at 2017-08-31 20:18:36
Update from compiler team mtg: Still no progress. Let's run massif, at least! @arielb1 said they would do so.
Niko Matsakis at 2017-09-14 20:13:58
I uploaded some massif runs to dropbox (can you read them?).
Results:
- there's a 4.1GB peak in typeck caused by us basically creating a ton of unneeded inference variables. I believe I know how to fix it.
- there's another 3.9GB peak during translation. Old memory usage (from a June-era compiler) there was 2.9GB, so I need to see where that 1GB came from.
Ariel Ben-Yehuda at 2017-09-20 14:43:55
there's another 3.9GB peak during translation. Old memory usage (from a June-era compiler) there was 2.9GB, so I need to see where that 1GB came from.
I'm wondering if this is related to the async-llvm changes (https://github.com/rust-lang/rust/pull/43506)
Michael Woerister at 2017-09-22 13:29:44
@michaelwoerister seems plausible
Niko Matsakis at 2017-09-22 14:57:29
@michaelwoerister
I think we're only using 1 codegen unit. Shouldn't that make async LLVM a no-op?
Ariel Ben-Yehuda at 2017-09-24 11:24:16
actually, all memory usage is accounted for, so I'm sure it's not LLVM.
Ariel Ben-Yehuda at 2017-09-24 11:41:43
@arielb1 Oh right, with one CGU memory usage should be the same unless there's some kind of bug.
Michael Woerister at 2017-09-25 08:17:38
@petrochenkov's span reform PR had lost us 400MB of space here, which means making this compile is that much harder. Maybe we should just use enough bits for the crate when encoding spans?
Ariel Ben-Yehuda at 2017-09-27 16:44:45
or use 40 bit spans, which should be enough to handle 64MB crates?
Ariel Ben-Yehuda at 2017-09-27 19:28:39
@arielb1 I'm not sure I understand: The span PR makes this test case use 400 MB more -- or less?
Michael Woerister at 2017-09-28 19:59:25
it makes the test case use 400MB more, because the span just overflows the 24-bit length we have.
Ariel Ben-Yehuda at 2017-09-28 20:06:47
The current peak is during LLVM translation where we are using memory from both MIR and LLVM. I think that after miri lands, we could easily store constants as byte arrays, which should bring this well into the green zone.
Ariel Ben-Yehuda at 2017-09-28 20:21:29
triage: P-medium
It seems like we are going to wait until we can fix this properly.
Niko Matsakis at 2017-11-09 21:14:00
miri has landed. I'm not entirely sure how to reproduce this though.
Oli Scherer at 2018-04-17 14:51:38
Is it on the latest Nightly? If so, I’ll kick off an Appveyor build in a couple of hours.
-- steph
On 17 Apr 2018, at 15:52, Oliver Schneider notifications@github.com wrote:
miri has landed. I'm not entirely sure how to reproduce this though.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or mute the thread.
Stephan Hügel at 2018-04-17 15:05:59
jup. it's even in beta (although somewhat broken)
Oli Scherer at 2018-04-17 15:14:07
Just ran into #49930, which causes…some failures. Will hold off and try again when the backport lands.
Stephan Hügel at 2018-04-17 19:44:23
we have new nightlies!
Oli Scherer at 2018-04-19 06:59:25
i686 is failing on ac3c2288f (2018-04-18) with exit code: 3221225501: https://ci.appveyor.com/project/urschrei/lonlat-bng/build/job/qaipuqj88243xs4c#L115
(Which is a memory exhaustion error I think?)
Stephan Hügel at 2018-04-20 08:13:39
looks like it. Because the x86_64 ~~target~~ host works. You can probably build for the
i686target on thex86_64host.Oli Scherer at 2018-04-20 09:22:59
@oli-obk Oh, I had no idea – can you point me at some details?
Stephan Hügel at 2018-04-20 16:26:38
You need to install the cross toolchain via rustup and invoke cargo with the target flag for that cross target
Oli Scherer at 2018-04-21 08:29:45
Not sure whether this is exactly the same issue, but compiling the test suite for v0.8.5 of the
xxhash-rustcrate also failed on 32-bit platforms due torustcmemory exhaustion. in that crate,test-vector.rscontained a series of ~5Kassert_eq!expansions, andrustcwas consuming > 6GiB of RAM during the build on 64-bit platforms (which obviously won't be possible for a 32-bitrustcprocess).Future versions of that crate will work around the failure by changing the test.
I don't know how to run the more detailed RAM diagnostics @nikomatsakis and @alexcrichton report above, but i offer this as another example of excessive memory allocation that fails on 32-bit platforms. I can also report this as a separate issue, if that would be useful.
dkg at 2022-04-22 13:54:33