Revise usage of LLVM lifetime intrinsics
Currently, rustc emits lifetime intrinsics to declare the shortest possible lifetime for allocas. Unfortunately, this stops some optimizations from happening. For example, @eddyb came up with the following example:
#![crate_type="lib"]
extern crate test;
#[derive(Copy)]
struct Big {
large: [u64; 100000],
}
pub fn test_func() {
let x = Big {
large: [0; 100000],
};
test::black_box(x);
}
This currently results in the following optimized IR:
define void @_ZN9test_func20hef205289cff69060raaE() unnamed_addr #0 {
entry-block:
%x = alloca %struct.Big, align 8
%0 = bitcast %struct.Big* %x to i8*
%arg = alloca %struct.Big, align 8
call void @llvm.lifetime.start(i64 800000, i8* %0)
call void @llvm.memset.p0i8.i64(i8* %0, i8 0, i64 800000, i32 8, i1 false)
%1 = bitcast %struct.Big* %arg to i8*
call void @llvm.lifetime.start(i64 800000, i8* %1)
call void @llvm.memcpy.p0i8.p0i8.i64(i8* %1, i8* %0, i64 800000, i32 8, i1 false)
call void asm "", "r,~{dirflag},~{fpsr},~{flags}"(%struct.Big* %arg) #2, !noalias !0, !srcloc !3
call void @llvm.lifetime.end(i64 800000, i8* %1) #2, !alias.scope !4, !noalias !0
call void @llvm.lifetime.end(i64 800000, i8* %1)
call void @llvm.lifetime.end(i64 800000, i8* %0)
ret void
}
As you can see, there are still two allocas, one for x and one copy of it in %arg, which is used for the function call. Since x is unused otherwise, we could directly call memset on %arg and drop %x altogether. But the lifetime of %arg only start after the memset call, so the optimization doesn't happen. Moving the call to llvm.lifetime.start up makes the optimization possible.
Now, the lifetime intrinsics only buy us anything if the ranges don't overlap. If the ranges overlap, we may as well make them all start at the same point. One way might be to insert start/end calls at positions that match up with scopes in the language, using an insertion marker like we do for allocas to insert the start calls and a cleanup scope for the end calls. This should also make things more robust than it currently is. We had a few misoptimization problems due to missing calls to llvm.lifetime.end. :-/
Triage: I'm not aware of any change here.
Steve Klabnik at 2016-02-08 19:49:38
Updated code:
#![feature(test)] #![crate_type="lib"] extern crate test; struct Big { large: [u64; 100000], } pub fn test_func() { let x = Big { large: [0; 100000], }; test::black_box(x); }Now produces this IR:
; Function Attrs: nounwind uwtable define void @_ZN8rust_out9test_func17h4ab794041fab200cE() unnamed_addr #0 { entry-block: %arg = alloca %Big, align 8 %0 = bitcast %Big* %arg to i8* call void @llvm.lifetime.start(i64 800000, i8* %0) call void @llvm.memset.p0i8.i64(i8* %0, i8 0, i64 800000, i32 8, i1 false) call void asm "", "r,~{dirflag},~{fpsr},~{flags}"(%Big* nonnull %arg) #2, !noalias !0, !srcloc !3 call void @llvm.lifetime.end(i64 800000, i8* %0) #2, !alias.scope !4, !noalias !0 call void @llvm.lifetime.end(i64 800000, i8* %0) ret void }This only contains one
alloca, so this specific example is fixed, but I'm not aware of any changes to lifetime intrinsics. I think MIR trans does not employ them at all.Jonas Schievink at 2016-06-09 22:32:58
Looks like the fix is very specific to this example, because LLVM can now transform
memset(%a, ...); memcpy(%b, %a, ...)intomemset(%a, ...); memset(%b, ...), i.e. it only works for thememsetcase.This still exposes the problem:
#![feature(test)] #![crate_type="lib"] extern crate test; struct Big { large: [u64; 100000], } #[inline(never)] fn foo() -> Big { Big { large: [123; 100000], } } pub fn test_func() { let x = foo(); test::black_box(x); }Björn Steinbrink at 2016-06-10 14:50:45
Triage: today's nightly still produces
; playground::test_func ; Function Attrs: nounwind nonlazybind uwtable define void @_ZN10playground9test_func17h2edfd294766010a4E() unnamed_addr #1 { start: %_3 = alloca %Big, align 8 %x = alloca %Big, align 8 %0 = bitcast %Big* %x to i8* call void @llvm.lifetime.start.p0i8(i64 800000, i8* nonnull %0) ; call playground::foo call fastcc void @_ZN10playground3foo17he93a54860fc3d9b4E(%Big* noalias nocapture nonnull dereferenceable(800000) %x) %1 = bitcast %Big* %_3 to i8* call void @llvm.lifetime.start.p0i8(i64 800000, i8* nonnull %1) call void @llvm.memcpy.p0i8.p0i8.i64(i8* nonnull align 8 %1, i8* nonnull align 8 %0, i64 800000, i1 false) call void asm sideeffect "", "r,~{dirflag},~{fpsr},~{flags}"(%Big* nonnull %_3) #3, !noalias !3, !srcloc !6 call void @llvm.lifetime.end.p0i8(i64 800000, i8* nonnull %1) call void @llvm.lifetime.end.p0i8(i64 800000, i8* nonnull %0) ret void }two alloca's are still there.
Steve Klabnik at 2018-12-10 18:53:15
Can this still be reproduced? The current IR with Rust-side optimizations is:
<details><summary>Before LLVM optimization</summary>
</details>; main::test_func ; Function Attrs: uwtable define void @_ZN4main9test_func17hb1368fde245db818E() unnamed_addr #1 { start: %_2 = alloca [800000 x i8], align 8 %x = alloca [800000 x i8], align 8 ; call main::foo call void @_ZN4main3foo17h6f3f3fecffc2ba45E(ptr noalias nocapture noundef sret([800000 x i8]) align 8 dereferenceable(800000) %x) call void @llvm.lifetime.start.p0(i64 800000, ptr %_2) call void @llvm.memcpy.p0.p0.i64(ptr align 8 %_2, ptr align 8 %x, i64 800000, i1 false) call void asm sideeffect "", "r,~{memory}"(ptr %_2), !srcloc !2 call void @llvm.lifetime.end.p0(i64 800000, ptr %_2) ret void }After LLVM optimization:
; main::test_func ; Function Attrs: uwtable define void @_ZN4main9test_func17hb1368fde245db818E() unnamed_addr #1 { start: %_2 = alloca [800000 x i8], align 8 call void @llvm.lifetime.start.p0(i64 800000, ptr nonnull %_2) ; call main::foo call fastcc void @_ZN4main3foo17h6f3f3fecffc2ba45E(ptr noalias nocapture noundef nonnull sret([800000 x i8]) align 8 dereferenceable(800000) %_2) call void asm sideeffect "", "r,~{memory}"(ptr nonnull %_2) #4, !srcloc !2 call void @llvm.lifetime.end.p0(i64 800000, ptr nonnull %_2) ret void }That's only one
alloca. Is there a different reproducer that still shows how short lifetimes block these optimizations?Kalle Wachsmuth at 2024-09-06 20:05:33