rustc generates a lot of llvm ir for small programs due to inline generated drops

f23f9bc
Opened by Jeff Muizelaar at 2023-10-08 20:50:25

Building https://github.com/jrmuizel/webrender/blob/sample-min/sample-min/src/main.rs ends up generating 12MB of LLVM IR (25MB of LLVM IR with debug info turned on). This seems like an excessive amount and it has a big impact on build times.

  1. webrender marks a lot of functions as #[inline] for performance, which causes their IR to be included in your crate. #[inline] has always been problematic for build times.

    Jonas Schievink at 2017-01-04 21:15:46

  2. But shouldn't those inline functions only be emitted if they are used by main.rs?

    Jeff Muizelaar at 2017-01-04 21:39:38

  3. @jrmuizel I think you need to use lto = true for final binary if you want such cross-crate optimizations (dead line code elimination, inlining etc.) to happen correctly.

    Ingvar Stepanyan at 2017-01-04 21:49:54

  4. I'm not really looking for cross-crate optimizations so lto seems inappropriate.

    Why do we need to re-emit the ir for unused inline functions? How could these functions ever be used?

    Jeff Muizelaar at 2017-01-04 22:02:23

  5. They are used by the functions used by main.rs. All inline-functions transitively used by your code will be included in the crate's LLVM module IIRC. This gives LLVM the ability to inline all of them (if it decides to do so) into your code.

    Jonas Schievink at 2017-01-04 22:10:47

  6. So it looks like the bulk of this code is from the drop function implementations. i.e. mem::forgetting the return values from renderer::Renderer::new(opts) and compiling with -C panic=abort drops the IR size to 1.3MB. Is there a reason the drop functions need to be inline?

    Jeff Muizelaar at 2017-01-05 04:32:40

  7. I'm not really looking for cross-crate optimizations so lto seems inappropriate. Why do we need to re-emit the ir for unused inline functions?

    Well, you're looking for dead code elimination (not emitting functions from other crate that are unused), so you do want cross-crate optimization.

    Ingvar Stepanyan at 2017-01-05 11:58:29

  8. One reason is that automatically generated drop (like for Renderer) can not be marked inline explicitly, so it would limit programs to not have them inlineable by default.

    bluss at 2017-01-05 11:59:27

  9. It seems like we could avoid having inline generated drop functions in debug builds though.

    Jeff Muizelaar at 2017-01-05 17:01:37

  10. It looks like the majority of the code is being added because of monomorphization caused by the inline drop.

    Jeff Muizelaar at 2017-04-05 20:48:43

  11. running RUST_LOG=rustc_trans rustc on this example shows 4614 functions being translated. I wonder if it would be pursuing polymorphic code gen so that we don't get hammered with monomorphization.

    Jeff Muizelaar at 2017-07-11 19:16:13

  12. This codegen test suggests that drop glue is not marked for inlining by default: https://github.com/rust-lang/rust/commit/4ea25da237d75efc69d15824f6e04e2599420c38#diff-d13eb81c7eec73d0e1f5f62e9c15e987b7fac6b90e1924401152d1146200b92fR1-R2

    scottmcm at 2022-02-23 17:32:18

  13. Indeed. As I understand it, the problem is not that code is marked as inline it's that it's generic code that gets a monomorphized version generated for it instead of sharing it. See #68414 and #84175

    Jeff Muizelaar at 2022-02-23 21:01:40

  14. More specifically, for an example like:

    struct A(String);
    
    fn a(_: A) {}
    

    becomes something like:

    a(x: A) {
       core::ptr::drop_in_place::<A>(&x);
    }
    

    which even though it's not being inlined gets codegened in place because of monomorphization.

    Jeff Muizelaar at 2022-05-15 14:06:06