Missed optimization: references from pointers aren't treated as noalias
The following code results in x being dereferenced twice:
pub unsafe fn f(a: *mut i32, b: *mut i32, x: *const i32) {
*a = *x;
*b = *x;
}
That is to be expected. As far as I know, the Rust language spec doesn't give a way to mark unsafe raw pointers as noalias. It does, however, say that references are treated as noalias, so the following (correctly) optimizes out the extra dereference for both foo and bar:
pub fn foo(a: &mut i32, b: &mut i32, x: &i32) {
*a = *x;
*b = *x;
}
pub unsafe fn bar(a: *mut i32, b: *mut i32, x: *const i32) {
foo(&mut *a, &mut *b, &*x);
}
However, if we change the code to the following:
pub fn g(a: *mut i32, b: *mut i32, x: *const i32) {
let safe_a = unsafe { &mut *a };
let safe_b = unsafe { &mut *b };
let safe_x = unsafe { &*x };
*safe_a = *safe_x;
*safe_b = *safe_x;
}
then the extra dereference is not optimized out as it should be. For some reason, this optimization is missed in this situation. (note: I was comparing rustc 1.16.0-nightly (47c8d9fdc 2017-01-08) with flags -C opt-level=s)
The following code results in
abeing dereferenced again for the return value: (https://godbolt.org/z/YnrzMa3oj)pub unsafe fn g(a: *mut i32, b: *mut i32) -> i32 { *a = 0; *b = 1; return *a; }That is to be expected. In contrast, the same function with references avoids the load: (https://godbolt.org/z/YnrzMa3oj)
pub fn g_ref(a: &mut i32, b: &mut i32) -> i32 { *a = 0; *b = 1; return *a; }However, if we change the code to the following: (https://godbolt.org/z/YnrzMa3oj)
pub fn g(a: *mut i32, b: *mut i32, x: *const i32) { let a = &mut *a; let b = &mut *b; *a = 10; *b = 11; return *a; }then the extra dereference is not optimized out as it should be, because do not emit anything like
noaliasinside functions.Ralf Jung at 2024-12-14 11:58:21
It might be due to rust-lang/rust/issues/31681
bluss at 2017-01-09 11:02:04
Do you have a play link demonstrating noalias taking effect? AFAIK noalias annotations were completely removed after 1.7.0 due to the llvm bugs.
Here is your foo with some bits and pieces to defeat whole-program optimisation:
use std::process; fn main() { let r1 = foo(&mut 1, &mut 2, &3); let r2 = foo(&mut 1, &mut 2, &4); process::exit(r1 + r2) } #[inline(never)] pub fn foo(a: &mut i32, b: &mut i32, x: &i32) -> i32 { *a = *x; *b = *x; *a }https://is.gd/NsngDd
If you compile that in release mode and look at the LLVM IR, you'll see that foo does not have noalias annotations and performs the store twice.
Aidan Hobson Sayers at 2017-01-09 16:33:02
Here's a demo: https://godbolt.org/g/pSj51i
pub fn foo(a: &mut i32, b: &mut i32, x: &i32) { *a = *x; *b = *x; }compiles into:
example::foo: push rbp mov rbp, rsp mov eax, dword ptr [rdx] mov dword ptr [rdi], eax mov dword ptr [rsi], eax pop rbp retMichael Bradshaw at 2017-01-09 16:38:52
Right, sorry, I was being dim - you're referring to the load only happening once.
This is probably because immutable references are marked as readonly in llvm (take a look at the debug llvm IR of my example). If you make x be
&mutin your example, you'll see it does do the load twice despite it theoretically giving more information in this function (if all three were noalias, as implied by&mut, llvm could infer that x is readonly). Given this deoptimisation, noalias annotations must not be being emitted at all which agrees with my understanding of the current situation.Your last example has all pointer arguments which do not get annotated, even if they're
const. I don't know what the current state of annotating local variables is in rust, which is what that would require.Aidan Hobson Sayers at 2017-01-09 17:59:26
I don't think the
readonlyattribute makes a difference here. Or at least, it shouldn't. LLVM documents it as:On an argument, this attribute indicates that the function does not write through this pointer argument, even though it may write to the memory that the pointer points to.
It clearly states that the memory may be written to (but only through a different pointer, not through the argument marked as
readonly). Thus, it should have to load the pointer twice.AFAIK, this optimization should only be possible if
a,b, orxare marked asnoalias.Indeed, if you compile the following C code:
void foo(int *a, int *b, const int* x) { *a = *x; *b = *x; }The generated LLVM IR marks
xasreadonlybut still loads it twice (which is to be expected):; Function Attrs: norecurse nounwind ssp uwtable define void @foo(i32* nocapture, i32* nocapture, i32* nocapture readonly) #0 { %4 = load i32, i32* %2, align 4, !tbaa !2 store i32 %4, i32* %0, align 4, !tbaa !2 %5 = load i32, i32* %2, align 4, !tbaa !2 store i32 %5, i32* %1, align 4, !tbaa !2 ret void }Actually, your point about making
xan&muthints at the reason why the extra load is optimized out in the first place: ifxis an immutable reference, it is marked asnoalias; if it's a mutable reference, then it is not marked asnoalias(edit: on second thought, is it valid to have tworeadonly noaliaspointers that alias? They're bothnoalias, but at the same time neither will be used to modify the data... I'm not actually sure how LLVM would respond in this case). I believe this is due to the issue bluss linked to: rust-lang/rust#31681However, it doesn't answer the question why
safe_xdoesn't get marked asnoalias, despite being an immutable reference. I believesafe_xshould be marked asnoalias.Michael Bradshaw at 2017-01-09 18:31:31
Good issue. noalias on &mut is unrelated, sorry for the misdirection.
I'd look at
- Is rustc emitting
noaliaswhere it can? (Or other relevant metadata) - Is this metadata lost in some optimization pass? (Example: #37945)
I'm not sure there is a place where noalias can be placed
For this code:
#![crate_type="lib"] pub fn foo(a: &mut i32, b: &mut i32, x: *const i32) { unsafe { let r = &*x; *a = *r; *b = *r; } }This is produced (
rustc --emit=llvm-ir noalias.rs). It looks like there is no place where noalias can be used, except if it were back traced to the function argument for this particular function body.; ModuleID = 'noalias.cgu-0.rs' source_filename = "noalias.cgu-0.rs" target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128" target triple = "x86_64-unknown-linux-gnu" ; Function Attrs: uwtable define void @_ZN7noalias3foo17h1c56d659e6e1ab4bE(i32* dereferenceable(4), i32* dereferenceable(4), i32*) unnamed_addr #0 { entry-block: %_0 = alloca {} br label %start start: ; preds = %entry-block %3 = load i32, i32* %2 store i32 %3, i32* %0 %4 = load i32, i32* %2 store i32 %4, i32* %1 ret void } attributes #0 = { uwtable }bluss at 2017-01-09 19:14:56
- Is rustc emitting
This may be what you're looking for - #16515
Aidan Hobson Sayers at 2017-01-09 21:19:05
Perhaps someone can tag this issue as performance related? How can non-maintainers help to do such routine work? I vaguely remember @TimNN doing this a lot for issues here.
Sander Maijers at 2017-02-19 16:38:22
How can non-maintainers help to do such routine work?
Unfortunately, GitHub doesn't let anyone do issue triage without having a commit bit, it's kind of annoying.
If you email me a list of issues and changes that should be made, I can manually do it on your behalf, for now. It's not great. I'd like to solve this somehow, but there's so much work to do...
Steve Klabnik at 2017-02-20 20:04:12
Off-topic @steveklabnik: It's not so much a request from me, but a general question. Perhaps there is some way using the GitHub API and crowd sourced tagging.
Sander Maijers at 2017-02-20 20:11:32
So in which way is this issue not a duplicate of #16515?
Cc @rust-lang/wg-codegen
Anthony Ramine at 2018-04-02 11:23:43
So in which way is this issue not a duplicate of #16515?
In the related Stack Overflow question by the OP, an answer was provided:
pub unsafe fn f(a: *mut i32, b: *mut i32, x: *const i32) { (|safe_a: &mut i32, safe_b: &mut i32, safe_x: &i32| { *safe_a = *safe_x; *safe_b = *safe_x; })(&mut *a, &mut *b, &*x) }This results in the LLVM IR:
; playground::f1 ; Function Attrs: nounwind uwtable define void @_ZN10playground2f117h829c91c07d14df5dE(i32* %a, i32* %b, i32* readonly %x) unnamed_addr #1 { start: %0 = icmp ne i32* %a, null tail call void @llvm.assume(i1 %0) %1 = icmp ne i32* %b, null tail call void @llvm.assume(i1 %1) %2 = icmp ne i32* %x, null tail call void @llvm.assume(i1 %2) %x.val = load i32, i32* %x, align 4 store i32 %x.val, i32* %a, align 4, !alias.scope !0, !noalias !3 store i32 %x.val, i32* %b, align 4, !alias.scope !3, !noalias !0 ret void }Note that this contains
!noaliasandassume.The "normal" way of writing this:
pub unsafe fn f2(a: *mut i32, b: *mut i32, x: *const i32) { let safe_a: &mut i32 = &mut *a; let safe_b: &mut i32 = &mut *b; let safe_x: &i32 = &*x; *safe_a = *safe_x; *safe_b = *safe_x; }Has no such metadata:
define void @_ZN10playground2f217h6945a283c9ef76d3E(i32* nocapture %a, i32* nocapture %b, i32* nocapture readonly %x) unnamed_addr #0 { start: %0 = load i32, i32* %x, align 4 store i32 %0, i32* %a, align 4 %1 = load i32, i32* %x, align 4 store i32 %1, i32* %b, align 4 ret void }It appears that noalias information only applies at function boundaries.
Jake Goulding at 2018-09-21 13:51:58
Do we have an issue for
!alias.scopeand!noalias? cc @rkruppeEduard-Mihai Burtescu at 2018-09-22 06:42:32
@eddyb An issue for
!alias.scopewas linked above: https://github.com/rust-lang/rust/issues/16515Simonas Kazlauskas at 2018-09-22 06:49:41
@nagisa Oops, the way I read the start of the comment, I didn't think to check, my bad.
Eduard-Mihai Burtescu at 2018-09-22 07:46:11
#16515 is titled "make use of LLVM's scoped noalias metadata"
We are making use of it, for function arguments. We are not making use of it in every case. If the issues are the same, perhaps one of y'all would be good enough to improve the title of that issue to make it more discoverable, and maybe update it to have a list of places that should use the metadata and don't. Otherwise, it reads a lot like a nebulous "make the compiler make faster code" issue.
Jake Goulding at 2018-09-22 21:54:08
"metadata" is not function argument/return attributes, but rather the
!foothings on instructions, and we don't ever emit!noaliasAFAIK.Eduard-Mihai Burtescu at 2018-09-23 01:29:21
and we don't ever emit
!noaliasAFAIK.I am missing something, please help me understand what. There's literally
!noaliasin the LLVM IR I pasted earlier:pub unsafe fn f(a: *mut i32, b: *mut i32, x: *const i32) { (|safe_a: &mut i32, safe_b: &mut i32, safe_x: &i32| { *safe_a = *safe_x; *safe_b = *safe_x; })(&mut *a, &mut *b, &*x) }This results in the LLVM IR:
; playground::f1 ; Function Attrs: nounwind uwtable define void @_ZN10playground2f117h829c91c07d14df5dE(i32* %a, i32* %b, i32* readonly %x) unnamed_addr #1 { start: %0 = icmp ne i32* %a, null tail call void @llvm.assume(i1 %0) %1 = icmp ne i32* %b, null tail call void @llvm.assume(i1 %1) %2 = icmp ne i32* %x, null tail call void @llvm.assume(i1 %2) %x.val = load i32, i32* %x, align 4 store i32 %x.val, i32* %a, align 4, !alias.scope !0, !noalias !3 store i32 %x.val, i32* %b, align 4, !alias.scope !3, !noalias !0 ret void }Zooming and enhancing:
<code><pre>store i32 %x.val, i32* %a, align 4, !alias.scope !0, <b>!noalias !3</b> store i32 %x.val, i32* %b, align 4, !alias.scope !3, <b>!noalias !0</b></pre></code>
Jake Goulding at 2018-09-23 12:59:01
@shepmaster LLVM generates that itself after inlining
noaliasattributes (which are much easier to generate, as!alias.scopeis not involved, and we've had them for ages). The "smoking gun" isllvm.assumebeing called (it comes fromnonnull/dereferenceable) - we try to avoid callingllvm.assume, as it has, historically, slowed down/broken optimizations.One more recent example is I wanted use to emit
alignon (pointer) arguments in #45225, but had to give up because some tests don't optimize, due to LLVM's inliner generatingllvm.assume's.Eduard-Mihai Burtescu at 2018-09-23 13:03:08
Nominated for compiler team discussion - how should we track these instances of #16515? I edited its description to reflect the current status, but can we do something more organized? Do we want to close and open a new tracking issue? (prompted by @shepmaster in https://github.com/rust-lang/rust/issues/38941#issuecomment-423776089)
Eduard-Mihai Burtescu at 2018-09-23 13:51:02
So we very briefly discussed this at the T-compiler meeting.
I'm not sure what path we want to use going forward to track work items like this.
One possibility is having separate issues .. and then making #16515 a metabug tracking them?
Another possibility is a separate github project. Not sure if that setup is helpful or harmful for a topic like this.
@eddyb were there any particular thoughts/ideas you had? Or would you simply be disposed to close this and open a fresh tracking issue, as you mentioned above?
Felix S Klock II at 2018-10-25 15:03:36
@pnkfelix I was specifically wondering if we should close #16515 or keep using it.
Eduard-Mihai Burtescu at 2018-11-05 00:03:06
The original example now produces, eliding the second read
f: mov eax, dword ptr [rdx] mov dword ptr [rdi], eax mov dword ptr [rsi], eax retas of Rust 1.78. Invoking with an aliasing
aandxcauses Miri to report UB. This is probably due to the LLVM bump in that versionclubby789 at 2024-10-19 23:35:21
We still don't emit any
noaliasfor code like that though, so it has to optimize assuming the pointers alias.Ralf Jung at 2024-10-20 05:49:10
Now (with current nightly)
pub unsafe fn f(a: *mut i32, b: *mut i32, x: *const i32) { *a = *x; *b = *x; }compiled (-O) to
f: mov eax, dword ptr [rdx] mov dword ptr [rdi], eax mov dword ptr [rsi], eax retso
xdeference only once. See https://godbolt.org/z/9WGxGsdzG .pub fn g(a: *mut i32, b: *mut i32, x: *const i32) { let safe_a = unsafe { &mut *a }; let safe_b = unsafe { &mut *b }; let safe_x = unsafe { &*x }; *safe_a = *safe_x; *safe_b = *safe_x; }produces the same assembly https://godbolt.org/z/oa557jqcj ,
so I suppose the issue can be closed?
Evgeniy Dushistov at 2024-12-14 11:48:59
The reason
*xis run only once is that due to alignment,*xand*aare either the exact same memory or disjoint. Of they are the same memory,*a = *xis a NOP and we can keep using the old value. If they are disjoint, then obviously we can keep using the old value.So, what we need here is a better example; the issue almost certainly is still present.
Ralf Jung at 2024-12-14 11:52:31
I have updated the example. But I am not sure if it's worth keeping the issue, this is basically a duplicate of https://github.com/rust-lang/rust/issues/16515.
Ralf Jung at 2024-12-14 11:58:58