Performance regression in nightly-2017-01-15

5280b4a
Opened by Jon Gjengset at 2023-04-05 17:42:24

I've recently run into a performance regression in the Rust compiler in a project I'm working on. I've used nightlies to narrow it down to somewhere in 778988174..bf6d7b665. Specifically, the 2017-01-14 nightly is good, and 2017-01-15 is bad. The regression is independent of CARGO_INCREMENTAL=1.

If you have a machine with a few cores, you can test it on a clone of https://github.com/mit-pdos/distributary/. Run with

$ cargo run -v --release --bin vote -- --avg --articles=100000 --runtime=10 --migrate=3 soup:// | grep 'PUT+'

On -14, the "migration" mentioned in the middle finishes quite quickly. On -15, it takes a long time, and so PUT never transitions to PUT+.

I wanted to try a git bisect to narrow the problem down further, but compiling it takes forever. A cursory glance over the commit range doesn't reveal anything particularly suspect, and #rust-internals told me to post here to get some more eyes on the problem.

  1. Here's the commits that happened between the two commit hashes above:

    https://github.com/rust-lang/rust/compare/7789881...bf6d7b6

    Corey Farwell at 2017-02-12 23:55:31

  2. In addition to the "migration" performance issue, note also that the benchmark's GET and PUT throughputs degraded substantially from 2017-01-14 to 2017-01-15:

    2017-01-14

    avg PUT: 197047.48
    avg GET0: 1332991.45
    avg GET: 1332991.45
    avg PUT+: 153669.90
    avg GET0+: 1305670.42
    avg GET+: 1332991.45
    

    2017-01-15

    avg PUT: 87309.19
    avg GET0: 941140.38
    avg GET: 941140.38
    avg PUT+: 75320.96
    avg GET0+: 921691.27
    avg GET+: 941140.38
    

    Malte Schwarzkopf at 2017-02-12 23:58:42

  3. So, after some more digging, it turns out that CARGO_INCREMENTAL does matter. If unset entirely, the performance regression is no longer present. CARGO_INCREMENTAL= and CARGO_INCREMENTAL=0 still experience the bug.

    I'm running a git bisect now using ./x.py build --stage 1 src/libtest to re-build rust at each step, then cargo clean, and then env CARGO_INCREMENTAL=1 cargo build --bin vote --release followed by a run. Will report back once I have results.

    Jon Gjengset at 2017-02-13 00:43:22

  4. git bisect pointed to 5f90947c2c27d5d4ae30ccc0e83ffc95a8597128 as the first bad commit. I can confirm that using rustc built off of 02c7b117dadcfea4416f53b4bd99828bf750871f with CARGO_INCREMENTAL=1 produces the expected performance, and using rustc from 5f90947c2c27d5d4ae30ccc0e83ffc95a8597128 shows a significant performance decrease. If CARGO_INCREMENTAL is not set, 5f90947c2c27d5d4ae30ccc0e83ffc95a8597128 performs the same as 02c7b117dadcfea4416f53b4bd99828bf750871f.

    Jon Gjengset at 2017-02-13 02:48:18

  5. /cc #38944 @michaelwoerister

    Jon Gjengset at 2017-02-13 02:49:02

  6. FWIW, the project in question does a significant amount of work with HashMaps, so I would hazard a guess that the performance regression is somehow related to that. I'll see if I can cook up a benchmark. Could it be that this change is somehow preventing inlining of hashing or HashMap insertions when linking with std?

    Jon Gjengset at 2017-02-13 02:53:58

  7. And a minimal test-case showing that HashMap::insert is ~4x slower with 5f90947c2c27d5d4ae30ccc0e83ffc95a8597128 and CARGO_INCREMENTAL=1.

    #![feature(test)]
    
    extern crate test;
    extern crate rand;
    
    use test::Bencher;
    use rand::Rng;
    
    #[bench]
    fn hashmap_insert(b: &mut Bencher) {
        let mut val : u32 = 0;
        let mut rng = rand::thread_rng();
        let mut map = std::collections::HashMap::new();
    
        b.iter(|| { map.insert(rng.gen::<u8>() as usize, val); val += 1; })
    }
    
    $ diff -r bench-bad/ bench-good/
    $ rustup override list | grep bench-
    ~/bench-good                           rust-good
    ~/bench-bad                            rust-bad
    $ git -C ~/rust-good rev-parse HEAD
    02c7b117dadcfea4416f53b4bd99828bf750871f
    $ git -C ~/rust-bad rev-parse HEAD
    5f90947c2c27d5d4ae30ccc0e83ffc95a8597128
    $ (~/bench-good) cargo bench 
       Compiling libc v0.2.20
       Compiling rand v0.3.15
       Compiling bench-hashmap v0.1.0 (file://~/bench-good)
        Finished release [optimized] target(s) in 7.3 secs
         Running target/release/deps/bench_hashmap-0ee5f7a4b83ca3d0
    
    running 1 test
    test hashmap_insert ... bench:          39 ns/iter (+/- 3)
    
    test result: ok. 0 passed; 0 failed; 0 ignored; 1 measured
    
    $ (~/bench-good) cargo clean; env CARGO_INCREMENTAL=1 cargo bench
       Compiling libc v0.2.20
       Compiling rand v0.3.15
       Compiling bench-hashmap v0.1.0 (file://~/bench-good)
        Finished release [optimized] target(s) in 7.39 secs
         Running target/release/deps/bench_hashmap-0ee5f7a4b83ca3d0
    
    running 1 test
    test hashmap_insert ... bench:          38 ns/iter (+/- 3)
    
    test result: ok. 0 passed; 0 failed; 0 ignored; 1 measured
    
    $ (~/bench-bad) cargo bench
       Compiling libc v0.2.20
       Compiling rand v0.3.15
       Compiling bench-hashmap v0.1.0 (file://~/bench-bad)
        Finished release [optimized] target(s) in 7.27 secs
         Running target/release/deps/bench_hashmap-e5736083e8c2908f
    
    running 1 test
    test hashmap_insert ... bench:          38 ns/iter (+/- 3)
    
    test result: ok. 0 passed; 0 failed; 0 ignored; 1 measured
    
    $ (~/bench-bad) cargo clean; env CARGO_INCREMENTAL=1 cargo bench
       Compiling libc v0.2.20
       Compiling rand v0.3.15
       Compiling bench-hashmap v0.1.0 (file://~/bench-bad)
        Finished release [optimized] target(s) in 7.42 secs
         Running target/release/deps/bench_hashmap-e5736083e8c2908f
    
    running 1 test
    test hashmap_insert ... bench:         133 ns/iter (+/- 13)
    
    test result: ok. 0 passed; 0 failed; 0 ignored; 1 measured
    

    Jon Gjengset at 2017-02-13 03:11:41

  8. Thanks for the report. My guess is that the code runs slower because less inlining is happening in incremental mode. It could probably be solved by strategic placement of some #[inline] annotations in libstd.

    cc @rust-lang/libs

    Michael Woerister at 2017-02-13 14:52:18

  9. In general it's expected that incremental does not produce as-fast binaries as non-incremental, right? (due to a lack of inlining)

    Alex Crichton at 2017-02-13 15:38:53

  10. @alexcrichton it's still curious to see a regression like this though, no? also, a 4x performance decrease for HashMap::insert seems like something that might affect a lot of people, and make them want to turn of incremental compilation entirely :/

    Jon Gjengset at 2017-02-14 02:31:00

  11. cc @nikomatsakis

    Aaron Turon at 2017-02-14 03:42:48

  12. @jonhoo oh sure it's bad, but pick any factor slowdown and someone can come up with a use case where if something isn't inlined it'll have that much slowdown. Release builds should likely avoid incremental for now but the main target is debug builds today I believe.

    Alex Crichton at 2017-02-14 15:12:41

  13. I believe that this is "working as expected", yes, but it's very nice to have some test cases. We've been wanting to see if we can come up with better heuristics to help us with inlining across incremental units, and for that we need tests.

    Niko Matsakis at 2017-02-16 17:03:12

  14. ThinLTO might do away with this problem if it is fast enough to be used for incr. comp.

    Michael Woerister at 2017-03-06 15:00:33

  15. ThinLTO is now implemented and available to test this. Compiling in release mode the above benchmark I get:

    test hashmap_insert ... bench:          39 ns/iter (+/- 0)
    

    Compiling in release incremental mode I get:

    test hashmap_insert ... bench:          77 ns/iter (+/- 1)
    

    Compiling in release incremental mode with ThinLTO I get:

    test hashmap_insert ... bench:          43 ns/iter (+/- 2)
    

    It's not 100% the same but I'd be pretty willing to bet that it's "close enough"

    Alex Crichton at 2017-10-16 14:49:59

  16. This looks very promising!

    Michael Woerister at 2017-10-16 15:15:25

  17. What should be the appropriate course of action for this ticket?

    Esteban Kuber at 2019-05-15 22:13:24