Missed optimization: references from pointers aren't treated as noalias

2c48f77
Opened by Michael Bradshaw at 2024-12-14 11:58:58

The following code results in x being dereferenced twice:

pub unsafe fn f(a: *mut i32, b: *mut i32, x: *const i32) {
  *a = *x;
  *b = *x;
}

That is to be expected. As far as I know, the Rust language spec doesn't give a way to mark unsafe raw pointers as noalias. It does, however, say that references are treated as noalias, so the following (correctly) optimizes out the extra dereference for both foo and bar:

pub fn foo(a: &mut i32, b: &mut i32, x: &i32) {
  *a = *x;
  *b = *x;
}

pub unsafe fn bar(a: *mut i32, b: *mut i32, x: *const i32) {
  foo(&mut *a, &mut *b, &*x);
}

However, if we change the code to the following:

pub fn g(a: *mut i32, b: *mut i32, x: *const i32) {
  let safe_a = unsafe { &mut *a };
  let safe_b = unsafe { &mut *b };
  let safe_x = unsafe { &*x };
  *safe_a = *safe_x;
  *safe_b = *safe_x;
}

then the extra dereference is not optimized out as it should be. For some reason, this optimization is missed in this situation. (note: I was comparing rustc 1.16.0-nightly (47c8d9fdc 2017-01-08) with flags -C opt-level=s)

  1. The following code results in a being dereferenced again for the return value: (https://godbolt.org/z/YnrzMa3oj)

    pub unsafe fn g(a: *mut i32, b: *mut i32) -> i32 {
      *a = 0;
      *b = 1;
      return *a;
    }
    

    That is to be expected. In contrast, the same function with references avoids the load: (https://godbolt.org/z/YnrzMa3oj)

    pub fn g_ref(a: &mut i32, b: &mut i32) -> i32 {
      *a = 0;
      *b = 1;
      return *a;
    }
    

    However, if we change the code to the following: (https://godbolt.org/z/YnrzMa3oj)

    pub fn g(a: *mut i32, b: *mut i32, x: *const i32) {
      let a = &mut *a;
      let b = &mut *b;
      *a = 10;
      *b = 11;
      return *a;
    }
    

    then the extra dereference is not optimized out as it should be, because do not emit anything like noalias inside functions.

    Ralf Jung at 2024-12-14 11:58:21

  2. It might be due to rust-lang/rust/issues/31681

    bluss at 2017-01-09 11:02:04

  3. Do you have a play link demonstrating noalias taking effect? AFAIK noalias annotations were completely removed after 1.7.0 due to the llvm bugs.

    Here is your foo with some bits and pieces to defeat whole-program optimisation:

    use std::process;
    
    fn main() {
        let r1 = foo(&mut 1, &mut 2, &3);
        let r2 = foo(&mut 1, &mut 2, &4);
        process::exit(r1 + r2)
    }
    
    #[inline(never)]
    pub fn foo(a: &mut i32, b: &mut i32, x: &i32) -> i32 {
      *a = *x;
      *b = *x;
      *a
    }
    

    https://is.gd/NsngDd

    If you compile that in release mode and look at the LLVM IR, you'll see that foo does not have noalias annotations and performs the store twice.

    Aidan Hobson Sayers at 2017-01-09 16:33:02

  4. Here's a demo: https://godbolt.org/g/pSj51i

    pub fn foo(a: &mut i32, b: &mut i32, x: &i32) {
      *a = *x;
      *b = *x;
    }
    

    compiles into:

    example::foo:
            push    rbp
            mov     rbp, rsp
            mov     eax, dword ptr [rdx]
            mov     dword ptr [rdi], eax
            mov     dword ptr [rsi], eax
            pop     rbp
            ret
    

    Michael Bradshaw at 2017-01-09 16:38:52

  5. Right, sorry, I was being dim - you're referring to the load only happening once.

    This is probably because immutable references are marked as readonly in llvm (take a look at the debug llvm IR of my example). If you make x be &mut in your example, you'll see it does do the load twice despite it theoretically giving more information in this function (if all three were noalias, as implied by &mut, llvm could infer that x is readonly). Given this deoptimisation, noalias annotations must not be being emitted at all which agrees with my understanding of the current situation.

    Your last example has all pointer arguments which do not get annotated, even if they're const. I don't know what the current state of annotating local variables is in rust, which is what that would require.

    Aidan Hobson Sayers at 2017-01-09 17:59:26

  6. I don't think the readonly attribute makes a difference here. Or at least, it shouldn't. LLVM documents it as:

    On an argument, this attribute indicates that the function does not write through this pointer argument, even though it may write to the memory that the pointer points to.

    It clearly states that the memory may be written to (but only through a different pointer, not through the argument marked as readonly). Thus, it should have to load the pointer twice.

    AFAIK, this optimization should only be possible if a, b, or x are marked as noalias.

    Indeed, if you compile the following C code:

    void foo(int *a, int *b, const int* x) {
        *a = *x;
        *b = *x;
    }
    

    The generated LLVM IR marks x as readonly but still loads it twice (which is to be expected):

    ; Function Attrs: norecurse nounwind ssp uwtable
    define void @foo(i32* nocapture, i32* nocapture, i32* nocapture readonly) #0 {
      %4 = load i32, i32* %2, align 4, !tbaa !2
      store i32 %4, i32* %0, align 4, !tbaa !2
      %5 = load i32, i32* %2, align 4, !tbaa !2
      store i32 %5, i32* %1, align 4, !tbaa !2
      ret void
    }
    

    Actually, your point about making x an &mut hints at the reason why the extra load is optimized out in the first place: if x is an immutable reference, it is marked as noalias; if it's a mutable reference, then it is not marked as noalias (edit: on second thought, is it valid to have two readonly noalias pointers that alias? They're both noalias, but at the same time neither will be used to modify the data... I'm not actually sure how LLVM would respond in this case). I believe this is due to the issue bluss linked to: rust-lang/rust#31681

    However, it doesn't answer the question why safe_x doesn't get marked as noalias, despite being an immutable reference. I believe safe_x should be marked as noalias.

    Michael Bradshaw at 2017-01-09 18:31:31

  7. Good issue. noalias on &mut is unrelated, sorry for the misdirection.

    I'd look at

    1. Is rustc emitting noalias where it can? (Or other relevant metadata)
    2. Is this metadata lost in some optimization pass? (Example: #37945)

    I'm not sure there is a place where noalias can be placed

    For this code:

    #![crate_type="lib"]
    
    pub fn foo(a: &mut i32, b: &mut i32, x: *const i32) {
        unsafe {
            let r = &*x;
            *a = *r;
            *b = *r;
        }
    }
    

    This is produced (rustc --emit=llvm-ir noalias.rs). It looks like there is no place where noalias can be used, except if it were back traced to the function argument for this particular function body.

    ; ModuleID = 'noalias.cgu-0.rs'
    source_filename = "noalias.cgu-0.rs"
    target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128"
    target triple = "x86_64-unknown-linux-gnu"
    
    ; Function Attrs: uwtable
    define void @_ZN7noalias3foo17h1c56d659e6e1ab4bE(i32* dereferenceable(4), i32* dereferenceable(4), i32*) unnamed_addr #0 {
    entry-block:
      %_0 = alloca {}
      br label %start
    
    start:                                            ; preds = %entry-block
      %3 = load i32, i32* %2
      store i32 %3, i32* %0
      %4 = load i32, i32* %2
      store i32 %4, i32* %1
      ret void
    }
    
    attributes #0 = { uwtable }
    

    bluss at 2017-01-09 19:14:56

  8. This may be what you're looking for - #16515

    Aidan Hobson Sayers at 2017-01-09 21:19:05

  9. Perhaps someone can tag this issue as performance related? How can non-maintainers help to do such routine work? I vaguely remember @TimNN doing this a lot for issues here.

    Sander Maijers at 2017-02-19 16:38:22

  10. How can non-maintainers help to do such routine work?

    Unfortunately, GitHub doesn't let anyone do issue triage without having a commit bit, it's kind of annoying.

    If you email me a list of issues and changes that should be made, I can manually do it on your behalf, for now. It's not great. I'd like to solve this somehow, but there's so much work to do...

    Steve Klabnik at 2017-02-20 20:04:12

  11. Off-topic @steveklabnik: It's not so much a request from me, but a general question. Perhaps there is some way using the GitHub API and crowd sourced tagging.

    Sander Maijers at 2017-02-20 20:11:32

  12. So in which way is this issue not a duplicate of #16515?

    Cc @rust-lang/wg-codegen

    Anthony Ramine at 2018-04-02 11:23:43

  13. So in which way is this issue not a duplicate of #16515?

    In the related Stack Overflow question by the OP, an answer was provided:

    pub unsafe fn f(a: *mut i32, b: *mut i32, x: *const i32) {
        (|safe_a: &mut i32, safe_b: &mut i32, safe_x: &i32| {
            *safe_a = *safe_x;
            *safe_b = *safe_x;
        })(&mut *a, &mut *b, &*x)
    }
    

    This results in the LLVM IR:

    ; playground::f1
    ; Function Attrs: nounwind uwtable
    define void @_ZN10playground2f117h829c91c07d14df5dE(i32* %a, i32* %b, i32* readonly %x) unnamed_addr #1 {
    start:
      %0 = icmp ne i32* %a, null
      tail call void @llvm.assume(i1 %0)
      %1 = icmp ne i32* %b, null
      tail call void @llvm.assume(i1 %1)
      %2 = icmp ne i32* %x, null
      tail call void @llvm.assume(i1 %2)
      %x.val = load i32, i32* %x, align 4
      store i32 %x.val, i32* %a, align 4, !alias.scope !0, !noalias !3
      store i32 %x.val, i32* %b, align 4, !alias.scope !3, !noalias !0
      ret void
    }
    

    Note that this contains !noalias and assume.

    The "normal" way of writing this:

    pub unsafe fn f2(a: *mut i32, b: *mut i32, x: *const i32) {
        let safe_a: &mut i32 = &mut *a;
        let safe_b: &mut i32 = &mut *b;
        let safe_x: &i32 = &*x;
    
        *safe_a = *safe_x;
        *safe_b = *safe_x;
    }
    

    Has no such metadata:

    define void @_ZN10playground2f217h6945a283c9ef76d3E(i32* nocapture %a, i32* nocapture %b, i32* nocapture readonly %x) unnamed_addr #0 {
    start:
      %0 = load i32, i32* %x, align 4
      store i32 %0, i32* %a, align 4
      %1 = load i32, i32* %x, align 4
      store i32 %1, i32* %b, align 4
      ret void
    }
    

    It appears that noalias information only applies at function boundaries.

    Jake Goulding at 2018-09-21 13:51:58

  14. Do we have an issue for !alias.scope and !noalias? cc @rkruppe

    Eduard-Mihai Burtescu at 2018-09-22 06:42:32

  15. @eddyb An issue for !alias.scope was linked above: https://github.com/rust-lang/rust/issues/16515

    Simonas Kazlauskas at 2018-09-22 06:49:41

  16. @nagisa Oops, the way I read the start of the comment, I didn't think to check, my bad.

    Eduard-Mihai Burtescu at 2018-09-22 07:46:11

  17. #16515 is titled "make use of LLVM's scoped noalias metadata"

    We are making use of it, for function arguments. We are not making use of it in every case. If the issues are the same, perhaps one of y'all would be good enough to improve the title of that issue to make it more discoverable, and maybe update it to have a list of places that should use the metadata and don't. Otherwise, it reads a lot like a nebulous "make the compiler make faster code" issue.

    Jake Goulding at 2018-09-22 21:54:08

  18. "metadata" is not function argument/return attributes, but rather the !foo things on instructions, and we don't ever emit !noalias AFAIK.

    Eduard-Mihai Burtescu at 2018-09-23 01:29:21

  19. and we don't ever emit !noalias AFAIK.

    I am missing something, please help me understand what. There's literally !noalias in the LLVM IR I pasted earlier:

    pub unsafe fn f(a: *mut i32, b: *mut i32, x: *const i32) {
        (|safe_a: &mut i32, safe_b: &mut i32, safe_x: &i32| {
            *safe_a = *safe_x;
            *safe_b = *safe_x;
        })(&mut *a, &mut *b, &*x)
    }
    

    This results in the LLVM IR:

    ; playground::f1
    ; Function Attrs: nounwind uwtable
    define void @_ZN10playground2f117h829c91c07d14df5dE(i32* %a, i32* %b, i32* readonly %x) unnamed_addr #1 {
    start:
      %0 = icmp ne i32* %a, null
      tail call void @llvm.assume(i1 %0)
      %1 = icmp ne i32* %b, null
      tail call void @llvm.assume(i1 %1)
      %2 = icmp ne i32* %x, null
      tail call void @llvm.assume(i1 %2)
      %x.val = load i32, i32* %x, align 4
      store i32 %x.val, i32* %a, align 4, !alias.scope !0, !noalias !3
      store i32 %x.val, i32* %b, align 4, !alias.scope !3, !noalias !0
      ret void
    }
    

    Zooming and enhancing:

    <code><pre>store i32 %x.val, i32* %a, align 4, !alias.scope !0, <b>!noalias !3</b> store i32 %x.val, i32* %b, align 4, !alias.scope !3, <b>!noalias !0</b></pre></code>

    Jake Goulding at 2018-09-23 12:59:01

  20. @shepmaster LLVM generates that itself after inlining noalias attributes (which are much easier to generate, as !alias.scope is not involved, and we've had them for ages). The "smoking gun" is llvm.assume being called (it comes from nonnull / dereferenceable) - we try to avoid calling llvm.assume, as it has, historically, slowed down/broken optimizations.

    One more recent example is I wanted use to emit align on (pointer) arguments in #45225, but had to give up because some tests don't optimize, due to LLVM's inliner generating llvm.assume's.

    Eduard-Mihai Burtescu at 2018-09-23 13:03:08

  21. Nominated for compiler team discussion - how should we track these instances of #16515? I edited its description to reflect the current status, but can we do something more organized? Do we want to close and open a new tracking issue? (prompted by @shepmaster in https://github.com/rust-lang/rust/issues/38941#issuecomment-423776089)

    Eduard-Mihai Burtescu at 2018-09-23 13:51:02

  22. So we very briefly discussed this at the T-compiler meeting.

    I'm not sure what path we want to use going forward to track work items like this.

    One possibility is having separate issues .. and then making #16515 a metabug tracking them?

    Another possibility is a separate github project. Not sure if that setup is helpful or harmful for a topic like this.

    @eddyb were there any particular thoughts/ideas you had? Or would you simply be disposed to close this and open a fresh tracking issue, as you mentioned above?

    Felix S Klock II at 2018-10-25 15:03:36

  23. @pnkfelix I was specifically wondering if we should close #16515 or keep using it.

    Eduard-Mihai Burtescu at 2018-11-05 00:03:06

  24. The original example now produces, eliding the second read

    f:
            mov     eax, dword ptr [rdx]
            mov     dword ptr [rdi], eax
            mov     dword ptr [rsi], eax
            ret
    

    as of Rust 1.78. Invoking with an aliasing a and x causes Miri to report UB. This is probably due to the LLVM bump in that version

    clubby789 at 2024-10-19 23:35:21

  25. We still don't emit any noalias for code like that though, so it has to optimize assuming the pointers alias.

    Ralf Jung at 2024-10-20 05:49:10

  26. Now (with current nightly)

    pub unsafe fn f(a: *mut i32, b: *mut i32, x: *const i32) {
      *a = *x;
      *b = *x;
    }
    

    compiled (-O) to

    f:
            mov     eax, dword ptr [rdx]
            mov     dword ptr [rdi], eax
            mov     dword ptr [rsi], eax
            ret
    

    so x deference only once. See https://godbolt.org/z/9WGxGsdzG .

    pub fn g(a: *mut i32, b: *mut i32, x: *const i32) {
      let safe_a = unsafe { &mut *a };
      let safe_b = unsafe { &mut *b };
      let safe_x = unsafe { &*x };
      *safe_a = *safe_x;
      *safe_b = *safe_x;
    }
    

    produces the same assembly https://godbolt.org/z/oa557jqcj ,

    so I suppose the issue can be closed?

    Evgeniy Dushistov at 2024-12-14 11:48:59

  27. The reason *x is run only once is that due to alignment, *x and *a are either the exact same memory or disjoint. Of they are the same memory, *a = *x is a NOP and we can keep using the old value. If they are disjoint, then obviously we can keep using the old value.

    So, what we need here is a better example; the issue almost certainly is still present.

    Ralf Jung at 2024-12-14 11:52:31

  28. I have updated the example. But I am not sure if it's worth keeping the issue, this is basically a duplicate of https://github.com/rust-lang/rust/issues/16515.

    Ralf Jung at 2024-12-14 11:58:58