Non-optimal derived PartialEq for POD struct

3783bfa
Opened by Joe Wilm at 2025-02-25 01:00:45

Given a struct like

pub struct Foo {
    a: bool,
    b: bool,
    c: bool,
    d: bool
}

I would expect generated asm to be doing a dword comparison to check for equality. However, that's not what is seen unless explicitly transmuting before the comparison (interactive: https://godbolt.org/g/kYPaOh).

The PartialEq implementation looks something like:

pub fn eq(a: &Foo, b: &Foo) -> bool {
  a.a == b.a &&
    a.b == b.b &&
    a.c == b.c &&
    a.d == b.d
}

which produces the assembly:

example::eq:
        push    rbp
        mov     rbp, rsp
        mov     al, byte ptr [rdi]
        cmp     al, byte ptr [rsi]
        jne     .LBB0_1
        mov     al, byte ptr [rdi + 1]
        cmp     al, byte ptr [rsi + 1]
        jne     .LBB0_1
        mov     al, byte ptr [rdi + 2]
        cmp     al, byte ptr [rsi + 2]
        jne     .LBB0_1
        mov     cl, byte ptr [rdi + 3]
        mov     al, 1
        cmp     cl, byte ptr [rsi + 3]
        je      .LBB0_3
.LBB0_1:
        xor     eax, eax
.LBB0_3:
        pop     rbp
        ret

An optimized version using transmute (Rust):

pub fn eq_transmute(a: &Foo, b: &Foo) -> bool {
    unsafe {
        transmute_copy::<_, u32>(a) == transmute_copy::<_, u32>(b)
    }
}

produces the assembly:

example::eq_transmute:
        push    rbp
        mov     rbp, rsp
        mov     eax, dword ptr [rdi]
        cmp     eax, dword ptr [rsi]
        sete    al
        pop     rbp
        ret

Is this an optimization we should expect to be handled automatically?

  1. I'd say this is a fairly important missed optimization.

    Val Markovic at 2017-05-08 21:51:34

  2. It seems that this optimization simply doesn't exist in LLVM today. The IR we generate seems conductive to such a transformation: we're loading and comparing i8s (though this is partly due to a workaround for an optimizer bug), the loads have range metadata (and it's preserved through the optimization pipeline), the &Foo arguments have the dereferencable(4) attribute, etc.

    However, there's an issue with thi soptimization: the dword pointers would be (potentially) unaligned, which is fine and probably carries little or no performance penalty on x86, but is a big problem on other platforms. So if you want portable performance, you can't rely on this optimization, at least not without forcing alignment of 4 on the struct.

    This is probably also why LLVM doesn't do the optimization — its mid-level optimizers have only limited knowledge of how well unaligned loads and stores work, and so they're probably extremely averse to introducing them. From glancing over the source code, the "load combining" pass seems to bail out if the combined load would require alignment larger than the load that's being widened (in this case, a.a and b.b).

    So one way we might be able to help LLVM with this is to emit larger alignments on memory operations when possible (e.g., align N on a load of the first field of an align(N) struct).

    Hanna Kruppe at 2017-05-09 08:03:48

  3. ~~By the way, won't such optimization break in case of Option<Foo> & similar? Foo and Option<Foo> have the same size, so Option definitely stores it's discriminant in Foo, then transmuting would also see the discriminant, doesn't it?~~

    Nevermind. If the value is Some(_) then all the bools must be in the consistent 0 or 1 state, only None can be represented as 1 in a place where it can't be otherwise since None doesn't allow you to get &Foo. If you have a &Foo then the value must have a correct layout, including bools being either 1 or 0 so just comparing the word should work correctly modulo what is said by others in this thread (e.g.: alignment)

    waffle at 2020-07-26 21:10:42

  4. LLVM does generally perform this optimization (MergeICmps), the problem here is specifically the bool types involved. We're not comparing the bytes, but rather doing something like (a != 0) == (b != 0).

    Nikita Popov at 2022-03-18 08:48:16

  5. fixed since rustc 1.82.0: https://rust.godbolt.org/z/vbbM6h3Mv

    @rustbot label +e-needs-test

    Veera at 2025-02-25 01:00:41