regression: dead heap allocations aren't optimized out anymore

ac20fc6
Opened by Oli Scherer at 2023-10-08 20:49:30

#22159 was closed after a llvm update (#22526). It used to work (I remember I tried it out). Now it doesn't work anymore.

fn main() {
    let _ = Box::new(42);
}

http://is.gd/Wekr7w

LLVM-IR:

; ModuleID = 'rust_out.0.rs'
target datalayout = "e-i64:64-f80:128-n8:16:32:64-S128"
target triple = "x86_64-unknown-linux-gnu"

; Function Attrs: uwtable
define internal void @_ZN4main20h657e6a8d1dc11120eaaE() unnamed_addr #0 {
entry-block:
  %0 = tail call i8* @je_mallocx(i64 4, i32 0)
  %1 = icmp eq i8* %0, null
  br i1 %1, label %then-block-57-.i.i, label %"_ZN5boxed12Box$LT$T$GT$3new19h823444268625886800E.exit"

then-block-57-.i.i:                               ; preds = %entry-block
  tail call void @_ZN3oom20he7076b57c17ed7c6HYaE()
  unreachable

"_ZN5boxed12Box$LT$T$GT$3new19h823444268625886800E.exit": ; preds = %entry-block
  %2 = bitcast i8* %0 to i32*
  store i32 42, i32* %2, align 4
  %3 = icmp eq i8* %0, inttoptr (i64 2097865012304223517 to i8*)
  br i1 %3, label %"_ZN14Box$LT$i32$GT$8drop.86517h1e7c6ecb62969b35E.exit", label %cond.i

cond.i:                                           ; preds = %"_ZN5boxed12Box$LT$T$GT$3new19h823444268625886800E.exit"
  tail call void @je_sdallocx(i8* %0, i64 4, i32 0)
  br label %"_ZN14Box$LT$i32$GT$8drop.86517h1e7c6ecb62969b35E.exit"

"_ZN14Box$LT$i32$GT$8drop.86517h1e7c6ecb62969b35E.exit": ; preds = %"_ZN5boxed12Box$LT$T$GT$3new19h823444268625886800E.exit", %cond.i
  ret void
}

define i64 @main(i64, i8**) unnamed_addr #1 {
top:
  %2 = tail call i64 @_ZN2rt10lang_start20he050f8de3bcc02b7VRIE(i8* bitcast (void ()* @_ZN4main20h657e6a8d1dc11120eaaE to i8*), i64 %0, i8** %1)
  ret i64 %2
}

declare i64 @_ZN2rt10lang_start20he050f8de3bcc02b7VRIE(i8*, i64, i8**) unnamed_addr #1

declare noalias i8* @je_mallocx(i64, i32) unnamed_addr #1

; Function Attrs: cold noinline noreturn
declare void @_ZN3oom20he7076b57c17ed7c6HYaE() unnamed_addr #2

declare void @je_sdallocx(i8*, i64, i32) unnamed_addr #1

attributes #0 = { uwtable "split-stack" }
attributes #1 = { "split-stack" }
attributes #2 = { cold noinline noreturn "split-stack" }
  1. hmmm, I'm not so sure if it's a regression. It still works for vectors: http://is.gd/yNAFF6 . Maybe it never worked for boxes?

    Oli Scherer at 2015-04-08 15:02:07

  2. So, where do we go with this ticket? Was/is this a regression? is it worth tracking?

    Steve Klabnik at 2016-05-24 21:16:00

  3. @steveklabnik Probably a regression from when we moved from actual zeroing-drop to using the non-zero drop pattern. The check for that pattern basically says "leak this if the adress matches the drop pattern", so LLVM cannot remove the allocation because that would change semantics. Dynamic drop will fix this, and I think we should keep this open to track that, just to make sure.

    Björn Steinbrink at 2016-05-24 23:23:52

  4. This appears to be fixed: https://is.gd/kT0Gs9 Is this worth creating some kind of regression test for?

    Wesley Wiser at 2017-03-04 20:31:14

  5. We could test it by adding a run pass test that uses a custom allocator (which doesn't allocate, but panics).

    Oli Scherer at 2017-03-04 22:49:45

  6. tentatively needstest, then. Maybe a src/test/codegen test?

    bluss at 2017-03-04 23:20:43

  7. Seems to not work again on nightly.

    Simonas Kazlauskas at 2017-05-22 19:01:11

  8. I've used @Mark-Simulacrum 's bisection tool to find out the regressing commit. It was the LLVM 4.0 upgrade, commit 0777c757a6832dc5f8f218377f99960f5477311f .

    est31 at 2017-05-22 21:50:25

  9. Assigning myself just so it stays on my to-do list. Not immediately working on it, though. Feel free to take over if you want.

    Simonas Kazlauskas at 2017-06-01 16:24:28

  10. Historically this optimization was done by https://github.com/rust-lang/llvm/commit/4daef480d1241d040a247440d24013cbdeb741e7 but this commit was not carried forward to our current branch when the 4.0 upgrade was done because it no longer applies cleanly (IIRC)

    Alex Crichton at 2017-06-01 16:26:03

  11. Sadly, I couldn’t make the patch above to work. In fact, some testing seems to indicate that the responsibility for eliding allocations has been moved out of LLVM into clang.

    Namely, code like this

    #include<stdlib.h>
    void wowzers(int y) {
        void *z = malloc(y);
        if(z != NULL) free(z);
    }
    

    when compiled with clang test.c -emit-llvm -S -O3 -fno-builtin (clang version being 4.0) has the allocations in the produced IR

    ; Function Attrs: nounwind sspstrong uwtable
    define void @wowzers(i32) local_unnamed_addr #0 {
      %2 = zext i32 %0 to i64
      %3 = tail call noalias i8* @malloc(i64 %2) #2
      %4 = icmp eq i8* %3, null
      br i1 %4, label %6, label %5
    ; <label>:5:                                      ; preds = %1
      tail call void @free(i8* nonnull %3) #2
      br label %6
    ; <label>:6:                                      ; preds = %1, %5
      ret void
    }
    

    whereas if the special handling of the built-ins is not disabled (i.e. without -fno-builtin), it compiles to a ret void even with -O0.

    This seems to suggest to me that, unless we do some serious patchwork on LLVM (pretty sure we don’t want to do that), it falls onto rustc to optimise out heap allocations now.

    This could also very well be a bug. I have a test case on hand that does optimise out on 3.9 but not on 4.0. I might do a bisection.

    Simonas Kazlauskas at 2017-06-03 15:25:09

  12. Okay, never mind. I got it to work.

    Simonas Kazlauskas at 2017-06-03 16:10:54

  13. The LLVM upgrade reintroduces optimisation for _rust_allocate, but the optimisation for __rust_allocate_zeroed was backed out due to weird UB-like behaviour with bitvec iterators in rustc_data_structures.

    Should investigate eventually. Assigning myself.

    Simonas Kazlauskas at 2017-06-15 16:14:42

  14. cc @arielb1 you were interested in this yesterday.

    Simonas Kazlauskas at 2017-06-15 16:15:04

  15. I believe this is no longer a regression, so untagging as a regression.

    Alex Crichton at 2017-07-23 15:31:40

  16. It's not very clear to me whether this is still actionable. The original snippet doesn't seem to be misoptimised anymore. Can someone produce a new snippet that demonstrates the continued existence of the issue? Should the issue be closed?

    Cc @rust-lang/wg-codegen

    Anthony Ramine at 2018-04-02 11:37:29

  17. This should stay open as only a handful of alllocating functions are currently handled (i.e. malloc equivalent but not realloc IIRC).

    On Mon, Apr 2, 2018, 14:37 Anthony Ramine notifications@github.com wrote:

    It's not very clear to me whether this is still actionable. The original snippet doesn't seem to be misoptimised anymore. Can someone produce a new snippet that demonstrates the continued existence of the issue? Should the issue be closed?

    Cc @rust-lang/wg-codegen https://github.com/orgs/rust-lang/teams/wg-codegen

    — You are receiving this because you modified the open/close state. Reply to this email directly, view it on GitHub https://github.com/rust-lang/rust/issues/24194#issuecomment-377912330, or mute the thread https://github.com/notifications/unsubscribe-auth/AApc0mikNK6SdwDWKu-Gccdw9qaNIrHUks5tkg2RgaJpZM4D8f_0 .

    Simonas Kazlauskas at 2018-04-03 09:13:40

  18. Or was it calloc-equivalent.

    On Tue, Apr 3, 2018, 12:13 Simonas Kazlauskas s@kazlauskas.me wrote:

    This should stay open as only a handful of alllocating functions are currently handled (i.e. malloc equivalent but not realloc IIRC).

    On Mon, Apr 2, 2018, 14:37 Anthony Ramine notifications@github.com wrote:

    It's not very clear to me whether this is still actionable. The original snippet doesn't seem to be misoptimised anymore. Can someone produce a new snippet that demonstrates the continued existence of the issue? Should the issue be closed?

    Cc @rust-lang/wg-codegen https://github.com/orgs/rust-lang/teams/wg-codegen

    — You are receiving this because you modified the open/close state. Reply to this email directly, view it on GitHub https://github.com/rust-lang/rust/issues/24194#issuecomment-377912330, or mute the thread https://github.com/notifications/unsubscribe-auth/AApc0mikNK6SdwDWKu-Gccdw9qaNIrHUks5tkg2RgaJpZM4D8f_0 .

    Simonas Kazlauskas at 2018-04-03 09:15:03

  19. @nagisa Any snippet demonstrating the issue?

    Anthony Ramine at 2018-04-03 09:25:41

  20. #![feature(allocator_api)]
    use std::heap::{Heap, Layout, Alloc};
    
    pub unsafe fn alloc_zeroed_doesnt_optimise() {
        let _ = Heap.alloc_zeroed(Layout::from_size_align_unchecked(4, 8));
    }
    
    extern "C" {
        fn foo(x: *mut u8);
    }
    
    pub unsafe fn alloc_zeroed_should_optimise_rezeroing() {
        let a = Heap.alloc_zeroed(Layout::from_size_align_unchecked(16, 8)).unwrap();
        let slc = ::std::slice::from_raw_parts_mut(a, 16);
        for i in slc.iter_mut() {
            *i = 0;
        }
        foo(slc.as_mut_ptr());
    }
    

    Simonas Kazlauskas at 2018-04-03 10:24:59

  21. For searchability purposes, the functions are now called __rust_alloc_zeroed and __rust_dealloc.

    Jake Goulding at 2018-07-05 15:07:48

  22. This was noticed in the wild on Stack Overflow.

    Jake Goulding at 2018-07-05 15:09:17

  23. It would be nice if we didn't have to patch LLVM to support custom allocation functions. Unfortunately the last discussion on that topic didn't really end up anywhere: http://lists.llvm.org/pipermail/llvm-dev/2016-January/093625.html

    Nikita Popov at 2018-11-05 15:18:00

  24. The only way around that is to guarantee this optimization via MIR optimizations. But our MIR optimization story is nowhere near the level required for that.

    Oli Scherer at 2018-11-05 15:31:25

  25. Couldn't this be generalized by having a "pure, no side-effects" unsafe attribute?

    Arthur Silva at 2018-11-05 15:34:08