Compilation of a crate using a large static map fails on latest i686-pc-windows-gnu Beta

ea9b116
Opened by Stephan Hügel at 2023-10-25 21:37:58

I'm trying to build a cdylib (https://github.com/urschrei/lonlat_bng) which requires the https://github.com/urschrei/ostn15_phf crate.

building on AppVeyor, on i686-pc-windows-gnu, using the latest beta, is failing with an OOM error:

Details

The ostn15_phf crate is essentially just a very big static map, built using PHF (the generated, uncompiled map is around 42.9mb)

The build passed when running cargo test, using rustc 1.12.0-beta.3 (341bfe43c 2016-09-16): https://ci.appveyor.com/project/urschrei/lonlat-bng/build/105/job/3y1llt6luqs3phs3

It's now failing when running cargo test, using rustc 1.12.0-beta.6 (d3eb33ef8 2016-09-23): https://ci.appveyor.com/project/urschrei/lonlat-bng/build/job/27pgrkx2cnn2gw50

The failure occurs when compiling ostn15_phf with fatal runtime error: out of memory

  1. cc @brson just want to make sure you're aware of this.

    Steven Fackler at 2016-09-28 21:32:44

  2. Related to #36926 perhaps? ("1.12.0: High memory usage when linking in release mode with debug info")

    Niko Matsakis at 2016-10-04 09:38:01

  3. @urschrei have you tried this on platforms other than windows, out of curiosity?

    Niko Matsakis at 2016-10-06 13:33:24

  4. I see. It seems to work on x86_64-pc-windows-gnu, but not i686.

    Niko Matsakis at 2016-10-06 13:36:14

  5. I haven't tried to build i686 on Linux or OSX, but I easily could…

    Stephan Hügel at 2016-10-06 13:45:13

  6. Well, I just did a run on my linux box. The memory usage is certainly through the roof: https://gist.github.com/nikomatsakis/ea771dd69f12ebc5d3d5848fa59fb43a

    This is using nightly (rustc 1.13.0-nightly (a059cb2f3 2016-09-27)). The peak is around 4GB.

    Niko Matsakis at 2016-10-06 13:48:47

  7. So @alexcrichton has executed a massif run: https://gist.github.com/alexcrichton/d20d685dd7475b1801a2ccac6ba15b08

    ~~The peak result~~ Measurement 58 shows something like this:

    | Percentage | Area | | --- | --- | | 36% | HIR | | 9% | MIR | | 14% | type/region arenas | | 5% | region maps | | 2% | constants |

    The peak result (measurement 48) looks pretty similar. More memory used by MIR:

    | Percentage | Area | | --- | --- | | 26% | HIR | | 14% | MIR | | 5% | type/region arenas | | 4% | region maps | | 3% | constants |

    These numbers are just based on a kind of quick scan of the gist.

    Niko Matsakis at 2016-10-06 17:10:36

  8. It seems clear we need to revisit our handling of statics and constants in a big way. But then this has been clear for a long time. =) I'm wondering if we can find some kind of "quick fix" here.

    I also haven't compared with historical records -- but most of that memory we see above, we would have been allocating before too, so I'm guessing this is a case of being pushed just over the threshold on i686, versus a sudden spike.

    Niko Matsakis at 2016-10-06 17:16:02

  9. Sizes of some types from MIR (gathered from play):

    statement:      192
    statement-kind: 176
    lvalue:         16
    rvalue:         152
    local-decl:     48
    

    Niko Matsakis at 2016-10-07 17:15:34

  10. I am feeling torn here. It seems the best we can do short-term is to make some small changes to MIR/HIR and try to bring the peak down below 4GB. The long term fix would be to revisit the overall representation particularly around constants and see what we can do to bring the size down. One thing I was wondering about (which is probably answerable from @alexcrichton's measurements) is what percentage of memory is being used just in the side-tables vs the HIR itself.

    In any case, it seems like 152 bytes for a mir rvalue is really quite big.

    Niko Matsakis at 2016-10-12 19:42:37

  11. Looking again at the massif results, it looks like MIR is taking more memory than I initially thought. One place that seems to use quite a bit is the list of scopes.

    Niko Matsakis at 2016-10-12 19:48:53

  12. Better numbers:

    | percent | pass | | --- | --- | | 35.09% | HIR | | 34.96% | MIR | | 11.77% | mk_region, node_type_insert | | 6.24% | region_maps | | 2.42% | consts | | 1.39% | spans |

    Niko Matsakis at 2016-10-12 20:19:17

  13. Just pinging @nikomatsakis to keep this P-high bug on his radar.

    Brian Anderson at 2016-10-20 16:12:26

  14. I've made basically no progress here, I'm afraid. I think the most likely path forward in short term is to try and reduce memory usage in various data structures. Not very satisfying though. I'm not sure if we can make up the gap that way.

    Niko Matsakis at 2016-10-20 17:49:33

  15. Discussed in compiler meeting. Conclusion: miri would be the proper fix, but maybe we can shrink MIR a bit in the short term, perhaps enough to push us over the edge. I personally probably don't have time for this just now (have some other regr to examine). Hence re-assigning to @pnkfelix -- @arielb1 maybe also interested in doing something?

    Niko Matsakis at 2016-10-20 20:23:34

  16. cc @nnethercote in case he might have input on ways to attack this

    Felix S Klock II at 2016-10-27 20:16:52

  17. Massif is slow and the output isn't the easy to read, but it's the ideal tool for tackling this. The peak snapshot in @alexcrichton's data is number 48, and I've made a nicer, cut-down version of it here: https://gist.github.com/nnethercote/935db34ff2da854df8a69fa28c978497

    You can see that ~30% of the memory usage comes from lower_expr. I did a DHAT run (more on that in a moment) and this is the main culprit:

        ExprKind::Tup(ref elts) => {
            hir::ExprTup(elts.iter().map(|x| self.lower_expr(x)).collect())
        } 
    

    push_scope/pop_scope account for another ~15%.

    Nicholas Nethercote at 2016-10-28 04:20:19

  18. @nnethercote

    The push_scope/pop_scope is MIR things. I think interning lvalues (they take 16 bytes, but occur multiple times per statement/terminator - they should take 4) and boxing rvalues (many statements are storage statements that don't have an rvalue) should claw back some performance.

    Also, enum Operand is very common, in most contexts just an lvalue, and takes 72 bytes. Something should be done about this, even just boxing constants (which would reduce it to 24 bytes, or 16 bytes with interning lvalues, or 8 bytes with interning lvalues and bitpacking).

    Ariel Ben-Yehuda at 2016-10-28 05:50:16

  19. Having a 32-bit MIR "shared object representation" MirValue with 3/4 tag bits and the rest an index to an interner would basically apply all of these suggestions (where MirValue can represent all of Constant, Lvalue, Rvalue, Operand, and we keep the structs as strongly-typed newtypes).

    We can then intern MirValues and/or occasionally GC-intern them, for further memory gain.

    Ariel Ben-Yehuda at 2016-10-28 06:24:13

  20. push_scope/pop_scope account for another ~15%.

    Builder::scope_auxiliary ends up with 5.8M elements. It's only used when MIR is dumped. I tried removing it and the peak RSS dropped by 6.7% from 4.86 GB to 4.53 GB. It seems like it should be straightforward to make the scope_auxiliary pushes occur only when MIR dumping is enabled.

    Nicholas Nethercote at 2016-10-28 06:37:18

  21. @nnethercote

    Well, scopes are supposed to eventually be used for debuginfo, so we don't want to remove them (BTW, the Vec would be 33% smaller if VisibilityScope was a nonzero).

    BTW, how do you do your benchmarks?

    Ariel Ben-Yehuda at 2016-10-28 07:59:34

  22. Massif is slow and the output isn't the easy to read

    I was reminded of the fact that massif-visualizer exists on Linux. It's a nice graphical viewer for Massif output files. Much better than the textual output from ms_print.

    Nicholas Nethercote at 2016-10-31 02:09:49

  23. BTW, how do you do your benchmarks?

    With Massif and DHAT, both of which are Valgrind tools. https://gist.github.com/nnethercote/10ec6918e02029e7609c9bb8d594b986 shows how I invoke them on the rustc-benchmarks.

    Nicholas Nethercote at 2016-10-31 02:11:46

  24. @arielb1

    Well, scopes are supposed to eventually be used for debuginfo, so we don't want to remove them (BTW, the Vec would be 33% smaller if VisibilityScope was a nonzero).

    Do we need the full detail for debuginfo?

    Niko Matsakis at 2016-11-01 18:02:40

  25. Do we need the full detail for debuginfo?

    Good question. scope_auxiliary currently costs a 7% peak memory penalty for this workload, for the benefit of (a) pretty-printing, and (b) something involving debuginfo in the future. That's not compelling, IMHO, and I think we should definitely consider how to avoid it.

    Nicholas Nethercote at 2016-11-03 03:12:26

  26. Hmm, @eddyb thinks there should not be visibility scopes without variables. That's a bit confusing. @nnethercote, were you analyzing @alexcrichton's data? @alexcrichton, do you remember if you analyzed nightly or some earlier thing?

    Niko Matsakis at 2016-11-03 20:19:44

  27. So the push_scope is actually temporary values used to store the current scope during construction. I'm a bit surprised that it gets so large for constants, but it may make sense. Still, not really related to visibility scopes, and that memory should be freed after construction. This case is sort of "worst-case" because it is one gigantic function.

    Niko Matsakis at 2016-11-03 20:25:03

  28. triage: P-medium

    After lengthy discussion in the @rust-lang/compiler team mtg, we decided to demote this to medium. It'd be great to make some improvements to MIR representation, but we think it's unlikely we'll get enough to push this back under 4GB, and that the use case here is not representative (that is, MIR memory usage is not a big problem in general).

    The right fix for this particular code is definitely changing how we handle constants. Hopefully we can pursue that more aggressively. But we'd never backport such a thing.

    (Personally, I think it's still worth experimenting to see if we can lower MIR usage through various simple changes of the kind that have been discussed here.)

    Niko Matsakis at 2016-11-03 20:32:49

  29. @nnethercote, were you analyzing @alexcrichton's data?

    I have looked at that data and also done a bunch of Massif and DHAT runs myself, which showed much the same results.

    Nicholas Nethercote at 2016-11-03 21:18:11

  30. It'd be great to make some improvements to MIR representation

    HIR is a part of the problem too, lower_expr in particular. That memory lives as long as the MIR does. Is there a possibility of the HIR being freed earlier? Or is this another manifestation of the "one giant function" problem?

    Another note: most of the file consists of an array of elements like this:

    (694236, (96.220, -52.874, 53.638)),
    

    The nested tuple increases memory usage. If it was a flat 4-tuple that would reduce memory usage a decent amount. Four separate arrays would probably be even better. Having to rewrite the code to work around compiler limitations is far from ideal, but this info might be useful to @urschrei right now.

    Nicholas Nethercote at 2016-11-03 21:26:13

  31. @alexcrichton, do you remember if you analyzed nightly or some earlier thing?

    Unfortunately I've since forgotten, but I remember having to recompile rustc against the system allocator and it was likely this compiler or somewhere around there.

    Alex Crichton at 2016-11-03 21:44:50

  32. That memory lives as long as the MIR does. Is there a possibility of the HIR being freed earlier? Or is this another manifestation of the "one giant function" problem?

    This is the "one giant function" problem. We could try to be "Javascript-like" and parse HIR lazily when doing MIR lowering, but that's a fair bit of complexity.

    Ariel Ben-Yehuda at 2016-11-03 22:24:20

  33. I've implemented some visitors that collect data on the AST and HIR: https://github.com/rust-lang/rust/pull/37583 Their output for this crate looks like shown below.

    tl;dr: Reducing the size of ast::Expr (similar to what has already happened for hir::Expr) is probably a good investment.

    PRE EXPANSION AST STATS
    
    Name                Accumulated Size         Count     Item Size
    ----------------------------------------------------------------
    Mod                               40             1            40
    PathListItem                      72             2            36
    Local                             96             2            48
    Arm                              128             2            64
    Mac                              240             3            80
    FnDecl                           432             9            48
    StructField                      440             5            88
    Block                            480            10            48
    Stmt                             480            12            40
    ImplItem                         896             4           224
    Pat                            1_344            12           112
    Attribute                      3_408            71            48
    Item                           4_352            17           256
    PathSegment                    5_976            83            72
    Ty                             6_496            58           112
    Expr                           9_728            64           152
    ----------------------------------------------------------------
    Total                         34_608
    
    
    POST EXPANSION AST STATS
    
    Name                Accumulated Size         Count     Item Size
    ----------------------------------------------------------------
    Mod                               40             1            40
    Local                             48             1            48
    PathListItem                      72             2            36
    Arm                              128             2            64
    FnDecl                           336             7            48
    Stmt                             360             9            40
    Block                            384             8            48
    StructField                      440             5            88
    ImplItem                         896             4           224
    Pat                            1_232            11           112
    Attribute                      3_408            71            48
    Item                           4_352            17           256
    PathSegment                    7_128            99            72
    Ty                             7_168            64           112
    Expr                   1_013_064_952     6_664_901           152
    ----------------------------------------------------------------
    Total                  1_013_090_944
    
    
    HIR STATS
    
    Name                Accumulated Size         Count     Item Size
    ----------------------------------------------------------------
    Mod                               32             1            32
    Decl                              32             1            32
    Stmt                              40             1            40
    Local                             48             1            48
    PathListItem                      56             2            28
    Arm                               96             2            48
    FnDecl                           280             7            40
    StructField                      360             5            72
    Block                            384             8            48
    ImplItem                         704             4           176
    Pat                            1_056            11            96
    Path                           2_880            90            32
    Attribute                      3_408            71            48
    Item                           3_672            17           216
    Ty                             5_120            64            80
    PathSegment                    6_400           100            64
    Expr                     639_830_400     6_664_900            96
    ----------------------------------------------------------------
    Total                    639_854_968
    

    Michael Woerister at 2016-11-04 15:49:56

  34. tl;dr: Reducing the size of ast::Expr (similar to what has already happened for hir::Expr) is probably a good investment.

    These are interesting measurements, but I disagree with this conclusion. Let me explain why. Here is a screenshot of massif-visualizer's output for this program.

    massif

    There is an early peak around the 180,000 ms mark which is dominated by the AST. However, that memory is soon reclaimed -- note how the green, yellow, and orange segments disapper. (That these segments correspond to AST memory is hard to tell from the screenshot, but is clear when I interact with massif-visualizer.)

    But the global peak is much later, around the 1e+06 ms mark. At this point memory usage is dominated by HIR, MIR and Contexts/Arenas. Reducing the AST size won't reduce the global peak.

    Massif really is the ideal tool for finding and reducing peak memory usage. It's exactly what it was designed for.

    Nicholas Nethercote at 2016-11-04 21:23:31

  35. These are interesting measurements, but I disagree with this conclusion.

    That's why I wrote that it's "probably a good investment" instead of "this will solve the problem" :) But you are right of course, it won't help with the later memory spike.

    This massif-visualizer output looks pretty neat! I'll have to give it a try.

    Michael Woerister at 2016-11-07 16:06:11

  36. Pinging @michaelwoerister @arielb1 @nikomatsakis what's the status of this? Can we get it fixed for good any time soon?

    Brian Anderson at 2016-11-17 17:09:45

  37. Did #37764 maybe fix this? It should have an impact on peak memory usage here, right?

    Michael Woerister at 2016-11-17 17:48:58

  38. Yep, looks like it did: https://ci.appveyor.com/project/urschrei/lonlat-bng/build/119

    steph

    On 17 Nov 2016, at 17:49, Michael Woerister notifications@github.com wrote:

    Did #37764 maybe fix this? It should have an impact on peak memory usage here, right?

    — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or mute the thread.

    Stephan Hügel at 2016-11-17 18:20:13

  39. OK, I'm going to close this issue then, though we still obviously can do a lot to reduce memory usage for big static constants (and maybe MIR in general).

    Niko Matsakis at 2016-11-17 20:57:15

  40. Since this issue was filed, both #37445 and #37764 landed. Together they reduced peak RSS by roughly 20%. Not sure if that's enough of an improvement to avoid the OOM @urschrei was seeing.

    Taking a slightly longer view, here's a comparison of 1.11 vs a recent trunk, doing debug builds, measured with /usr/bin/time on Linux:

    1.11:

    104.13user 1.88system 1:45.51elapsed 100%CPU (0avgtext+0avgdata 8619484maxresident)k
    101768inputs+305992outputs (309major+1016440minor)pagefaults 0swaps
    

    Trunk:

    68.28user 0.96system 1:08.98elapsed 100%CPU (0avgtext+0avgdata 3887928maxresident)k
    0inputs+330936outputs (0major+522179minor)pagefaults 0swaps
    

    So peak memory is 2.21x better and compile time is 1.53x better.

    Nicholas Nethercote at 2016-11-17 22:05:35

  41. @nnethercote the latest nightly builds on i686 without OOM!

    steph

    On 17 Nov 2016, at 22:05, Nicholas Nethercote notifications@github.com wrote:

    Since this issue was filed, both #37445 and #37764 landed. Together they reduced peak RSS by roughly 20%. Not sure if that's enough of an improvement to avoid the OOM @urschrei was seeing.

    Taking a slightly longer view, here's a comparison of 1.11 vs a recent trunk, doing debug builds, measured with /usr/bin/time on Linux:

    1.11:

    104.13user 1.88system 1:45.51elapsed 100%CPU (0avgtext+0avgdata 8619484maxresident)k 101768inputs+305992outputs (309major+1016440minor)pagefaults 0swaps Trunk:

    68.28user 0.96system 1:08.98elapsed 100%CPU (0avgtext+0avgdata 3887928maxresident)k 0inputs+330936outputs (0major+522179minor)pagefaults 0swaps So peak memory is 2.21x better and compile time is 1.53x better.

    — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or mute the thread.

    Stephan Hügel at 2016-11-17 22:28:00

  42. An interesting observation gleaned via -Z print-type-sizes in #37770 : mir::TerminatorKind (256 bytes) has some low hanging fruit, in particular I suspect the Assert variant is not common and thus could have some of its structure boxed.

    print-type-size type: `mir::TerminatorKind` overall bytes: 256 align: 8
    print-type-size    discrimimant bytes: 4
    print-type-size    variant Goto exact bytes: 4
    print-type-size        field .target bytes: 4
    print-type-size    variant If exact bytes: 84
    print-type-size        padding bytes: 4
    print-type-size        field .cond bytes: 72 align: 8
    print-type-size        field .targets bytes: 8
    print-type-size    variant Switch exact bytes: 52
    print-type-size        padding bytes: 4
    print-type-size        field .discr bytes: 16 align: 8
    print-type-size        field .adt_def bytes: 8
    print-type-size        field .targets bytes: 24
    print-type-size    variant SwitchInt exact bytes: 76
    print-type-size        padding bytes: 4
    print-type-size        field .discr bytes: 16 align: 8
    print-type-size        field .switch_ty bytes: 8
    print-type-size        field .values bytes: 24
    print-type-size        field .targets bytes: 24
    print-type-size    variant Resume exact bytes: 0
    print-type-size    variant Return exact bytes: 0
    print-type-size    variant Unreachable exact bytes: 0
    print-type-size    variant Drop exact bytes: 32
    print-type-size        padding bytes: 4
    print-type-size        field .location bytes: 16 align: 8
    print-type-size        field .target bytes: 4
    print-type-size        field .unwind bytes: 8
    print-type-size    variant DropAndReplace exact bytes: 104
    print-type-size        padding bytes: 4
    print-type-size        field .location bytes: 16 align: 8
    print-type-size        field .value bytes: 72
    print-type-size        field .target bytes: 4
    print-type-size        field .unwind bytes: 8
    print-type-size    variant Call exact bytes: 140
    print-type-size        padding bytes: 4
    print-type-size        field .func bytes: 72 align: 8
    print-type-size        field .args bytes: 24
    print-type-size        field .destination bytes: 32
    print-type-size        field .cleanup bytes: 8
    print-type-size    variant Assert exact bytes: 248
    print-type-size        padding bytes: 4
    print-type-size        field .cond bytes: 72 align: 8
    print-type-size        field .expected bytes: 1
    print-type-size        padding bytes: 7
    print-type-size        field .msg bytes: 152 align: 8
    print-type-size        field .target bytes: 4
    print-type-size        field .cleanup bytes: 8
    print-type-size    end padding bytes: 4
    

    Felix S Klock II at 2016-11-18 22:39:38

  43. Operand is 72 bytes?! Shrinking that alone might have a bigger impact than anything we'd do to terminators (which are just one per BB).

    Eduard-Mihai Burtescu at 2016-11-19 01:59:16

  44. I too have seen Operand come up multiple times while looking at this, and I agree that shrinking it could be a decent win. #37770 will help understanding why it's so big.

    Nicholas Nethercote at 2016-11-19 02:29:43

  45. Probably due to directly embedding const values.

    Eduard-Mihai Burtescu at 2016-11-19 02:35:32

  46. Tagging @brson @michaelwoerister @arielb1 @eddyb, because the OOM error has begun to re-occur on the following platforms:

    Linux x86: Nightly and Beta Windows x86 / i686: Nightly (not currently testing on Beta)

    On Mac OS x86, stable, beta, and nightly currently pass. On Linux, stable passes On Windows x86, stable passes

    Also to note that since ostn15_phf is too big for crates.io, a crater / cargobomb pass will never check it…

    Stephan Hügel at 2017-04-26 14:54:48

  47. can you bisect it?

    Ariel Ben-Yehuda at 2017-04-27 19:17:39

  48. Possibly related to https://github.com/rust-lang/rust/issues/40355. I'll see if I can find a working previous nightly tomorrow.

    Stephan Hügel at 2017-04-27 23:24:06

  49. Thanks for reopening @urschrei. Vexing problem.

    Brian Anderson at 2017-05-04 16:06:33

  50. Seems like we didn't take any action to reduce the sizes of various MIR constructs, is that correct?

    Niko Matsakis at 2017-05-04 16:10:04

  51. We discussed in the meeting. It's frustrating that peak memory usage creeped up here. We are considering the "proper fix", which would basically detect the fact that this is a massive constant and try to create a byte-array instead of MIR instructions. There is some debate as to the best way to do this (something relatively special-cased? or can we "execute with miri as we go"?). We'd also need to extend mir with the concept of an allocation and so forth so we can represent that.

    In the meantime, I'm going to mark as P-high and assign to myself with the hope that I can find some low-hanging fruit in the MIR definition to prune memory usage. I feel like we've not picked all that fruit yet.

    Niko Matsakis at 2017-05-11 20:51:31

  52. triage: P-high

    Niko Matsakis at 2017-05-11 20:51:36

  53. I think what I'd do is:

    • identify HIR subexpressions that we may turn into a byte buffer (very limited subset)
    • use HAIR to turn those subexpressions, one leaf at a time, into miri values
    • write the leaves into a miri allocation in their appropriate locations

    Eduard-Mihai Burtescu at 2017-05-11 21:03:36

  54. @eddyb that sounds like what I had in mind as well

    Niko Matsakis at 2017-05-12 11:02:13

  55. So I did some digging into the MIR representation. There is definitely some inefficiency here, but I think ultimately the best thing we could do in this case is to remove (from statics) the StorageLive and StorageDead annotations. Something like 60% of the MIR in this case is StorageLive/StorageDead statements, and those are totally meaningless in a static constant anyhow!

    Niko Matsakis at 2017-05-15 23:21:12

  56. Looks like we have some code to suppress storage-live/storage-dead in constants, but it was broken by the code that fixed https://github.com/rust-lang/rust/issues/38669 (I can't find the PR for that). It was relying on the "temp lifetime" which is no longer always the temp-lifetime. I think I'll add a more direct fix.

    Niko Matsakis at 2017-05-15 23:52:13

  57. Recently a change to reduce the size of some MIR structures has landed. Would be good to re-check.

    Simonas Kazlauskas at 2017-05-18 16:57:30

  58. @urschrei I believe that now two PRs have landed that (together) should dramatically reduce peak memory usage (#42023, #41926). Would you be so kind as to check whether things are working again once the next nightly is issued?

    Niko Matsakis at 2017-05-23 12:56:50

  59. @nikomatsakis Yep will do. Windows i686 is the last holdout; the current x86_64 Nightly is passing on macOS, Linux, and Windows.

    Stephan Hügel at 2017-05-23 13:08:57

  60. Still failing on Windows i686 using Nightly 5b13bff5203c1bdc6ac6dc87f69b5359a9503078.

    Stephan Hügel at 2017-05-24 00:50:09

  61. @urschrei bah humbug. Thanks!

    Niko Matsakis at 2017-05-25 15:37:09

  62. Next step: we should do some measurements (ideally with valgrind) of building this crate in various environments to try and see how much progress we've made, and how much we've got to go.

    Niko Matsakis at 2017-05-25 20:14:29

  63. OK so I attempted to run with massif but that failed miserably (ran for hours without terminating). I have deleted like 75% of the lines in the file and I'm giving it another try.

    Niko Matsakis at 2017-05-26 19:03:31

  64. @nikomatsakis So I did manage to run a massif run myself (same setup as yours) - I emailed the xz-ed massif report to you, it's pretty small.

    I see multiple gigabytes of peak being used in borrowck of all things, in the nodemap returned in this place: https://github.com/rust-lang/rust/blob/master/src/librustc/middle/dataflow.rs#L180.

    Ariel Ben-Yehuda at 2017-05-31 23:48:42

  65. For borrowck, the only remaining improvement I can think of is instead of keeping dataflow information per-node, linear runs of nodes could be treated as "basic blocks".

    Then again, there are no variables, so perhaps the CFG representation and/or nodeid_to_index can be optimized to avoid that specific allocation, or it could be built lazily, when there are variables.

    Eduard-Mihai Burtescu at 2017-06-01 16:27:23

  66. @eddyb

    Maybe we can use some sort of O(n) optimization there? For example, a single sorted (NodeId, CFGIndex) array (what's the typical ratio of node ids to cfg indexes?)

    Ariel Ben-Yehuda at 2017-06-01 17:11:15

  67. @arielb1 I don't know, but for this exact case maybe you can just make the compiler print it?

    Eduard-Mihai Burtescu at 2017-06-01 17:22:07

  68. This does seem like some low-hanging fruit here. That table is using >21% of the memory, if I am reading the massif output correctly.

    Niko Matsakis at 2017-06-08 17:47:31

  69. I sort of like the idea of building it in a lazy fashion, even though it's not a very general purpose fix.

    Niko Matsakis at 2017-06-08 17:48:14

  70. I'll investigate that. Should be fairly easy, I would think. I can also determine the density here.

    Niko Matsakis at 2017-06-08 17:49:51

  71. I tried the lazy construction thing, but it doesn't seem to be working. In particular, I think we still wind up needed it, even for the big constant.

    Niko Matsakis at 2017-06-13 20:17:33

  72. r? @arielb1 -- due to not having much time, shifting over to @arielb1, who is planning on trying some clever hacks to skip borrowck for these huge constants for now

    Niko Matsakis at 2017-07-27 20:41:08

  73. @arielb1, can you tell us how https://github.com/rust-lang/rust/pull/43547 affected memory usage here? It seems to help quite a bit, right?

    Michael Woerister at 2017-08-03 20:51:25

  74. @urschrei

    We improved memory usage on your case. Can you try to run CI again on a new nightly?

    Ariel Ben-Yehuda at 2017-08-07 13:21:56

  75. It's still failing on Windows i686 using nightly ba1d065ff: https://ci.appveyor.com/project/urschrei/lonlat-bng/build/job/utg8w0i12vmytkvi

    Stephan Hügel at 2017-08-07 14:33:35

  76. status: the original issue was fixed but memory usage crept up. I'll try to un-creep it back.

    Ariel Ben-Yehuda at 2017-08-10 20:27:07

  77. Update from compiler team mtg: No progress, I think.

    Niko Matsakis at 2017-08-31 20:18:36

  78. Update from compiler team mtg: Still no progress. Let's run massif, at least! @arielb1 said they would do so.

    Niko Matsakis at 2017-09-14 20:13:58

  79. I uploaded some massif runs to dropbox (can you read them?).

    Results:

    • there's a 4.1GB peak in typeck caused by us basically creating a ton of unneeded inference variables. I believe I know how to fix it.
    • there's another 3.9GB peak during translation. Old memory usage (from a June-era compiler) there was 2.9GB, so I need to see where that 1GB came from.

    Ariel Ben-Yehuda at 2017-09-20 14:43:55

  80. there's another 3.9GB peak during translation. Old memory usage (from a June-era compiler) there was 2.9GB, so I need to see where that 1GB came from.

    I'm wondering if this is related to the async-llvm changes (https://github.com/rust-lang/rust/pull/43506)

    Michael Woerister at 2017-09-22 13:29:44

  81. @michaelwoerister seems plausible

    Niko Matsakis at 2017-09-22 14:57:29

  82. @michaelwoerister

    I think we're only using 1 codegen unit. Shouldn't that make async LLVM a no-op?

    Ariel Ben-Yehuda at 2017-09-24 11:24:16

  83. actually, all memory usage is accounted for, so I'm sure it's not LLVM.

    Ariel Ben-Yehuda at 2017-09-24 11:41:43

  84. @arielb1 Oh right, with one CGU memory usage should be the same unless there's some kind of bug.

    Michael Woerister at 2017-09-25 08:17:38

  85. @petrochenkov's span reform PR had lost us 400MB of space here, which means making this compile is that much harder. Maybe we should just use enough bits for the crate when encoding spans?

    Ariel Ben-Yehuda at 2017-09-27 16:44:45

  86. or use 40 bit spans, which should be enough to handle 64MB crates?

    Ariel Ben-Yehuda at 2017-09-27 19:28:39

  87. @arielb1 I'm not sure I understand: The span PR makes this test case use 400 MB more -- or less?

    Michael Woerister at 2017-09-28 19:59:25

  88. it makes the test case use 400MB more, because the span just overflows the 24-bit length we have.

    Ariel Ben-Yehuda at 2017-09-28 20:06:47

  89. The current peak is during LLVM translation where we are using memory from both MIR and LLVM. I think that after miri lands, we could easily store constants as byte arrays, which should bring this well into the green zone.

    Ariel Ben-Yehuda at 2017-09-28 20:21:29

  90. triage: P-medium

    It seems like we are going to wait until we can fix this properly.

    Niko Matsakis at 2017-11-09 21:14:00

  91. miri has landed. I'm not entirely sure how to reproduce this though.

    Oli Scherer at 2018-04-17 14:51:38

  92. Is it on the latest Nightly? If so, I’ll kick off an Appveyor build in a couple of hours.

    -- steph

    On 17 Apr 2018, at 15:52, Oliver Schneider notifications@github.com wrote:

    miri has landed. I'm not entirely sure how to reproduce this though.

    — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or mute the thread.

    Stephan Hügel at 2018-04-17 15:05:59

  93. jup. it's even in beta (although somewhat broken)

    Oli Scherer at 2018-04-17 15:14:07

  94. Just ran into #49930, which causes…some failures. Will hold off and try again when the backport lands.

    Stephan Hügel at 2018-04-17 19:44:23

  95. we have new nightlies!

    Oli Scherer at 2018-04-19 06:59:25

  96. i686 is failing on ac3c2288f (2018-04-18) with exit code: 3221225501: https://ci.appveyor.com/project/urschrei/lonlat-bng/build/job/qaipuqj88243xs4c#L115

    (Which is a memory exhaustion error I think?)

    Stephan Hügel at 2018-04-20 08:13:39

  97. looks like it. Because the x86_64 ~~target~~ host works. You can probably build for the i686 target on the x86_64 host.

    Oli Scherer at 2018-04-20 09:22:59

  98. @oli-obk Oh, I had no idea – can you point me at some details?

    Stephan Hügel at 2018-04-20 16:26:38

  99. You need to install the cross toolchain via rustup and invoke cargo with the target flag for that cross target

    Oli Scherer at 2018-04-21 08:29:45

  100. Not sure whether this is exactly the same issue, but compiling the test suite for v0.8.5 of the xxhash-rust crate also failed on 32-bit platforms due to rustc memory exhaustion. in that crate, test-vector.rs contained a series of ~5K assert_eq! expansions, and rustc was consuming > 6GiB of RAM during the build on 64-bit platforms (which obviously won't be possible for a 32-bit rustc process).

    Future versions of that crate will work around the failure by changing the test.

    I don't know how to run the more detailed RAM diagnostics @nikomatsakis and @alexcrichton report above, but i offer this as another example of excessive memory allocation that fails on 32-bit platforms. I can also report this as a separate issue, if that would be useful.

    dkg at 2022-04-22 13:54:33