Non-optimal derived PartialEq for POD struct
Given a struct like
pub struct Foo {
a: bool,
b: bool,
c: bool,
d: bool
}
I would expect generated asm to be doing a dword comparison to check for equality. However, that's not what is seen unless explicitly transmuting before the comparison (interactive: https://godbolt.org/g/kYPaOh).
The PartialEq implementation looks something like:
pub fn eq(a: &Foo, b: &Foo) -> bool {
a.a == b.a &&
a.b == b.b &&
a.c == b.c &&
a.d == b.d
}
which produces the assembly:
example::eq:
push rbp
mov rbp, rsp
mov al, byte ptr [rdi]
cmp al, byte ptr [rsi]
jne .LBB0_1
mov al, byte ptr [rdi + 1]
cmp al, byte ptr [rsi + 1]
jne .LBB0_1
mov al, byte ptr [rdi + 2]
cmp al, byte ptr [rsi + 2]
jne .LBB0_1
mov cl, byte ptr [rdi + 3]
mov al, 1
cmp cl, byte ptr [rsi + 3]
je .LBB0_3
.LBB0_1:
xor eax, eax
.LBB0_3:
pop rbp
ret
An optimized version using transmute (Rust):
pub fn eq_transmute(a: &Foo, b: &Foo) -> bool {
unsafe {
transmute_copy::<_, u32>(a) == transmute_copy::<_, u32>(b)
}
}
produces the assembly:
example::eq_transmute:
push rbp
mov rbp, rsp
mov eax, dword ptr [rdi]
cmp eax, dword ptr [rsi]
sete al
pop rbp
ret
Is this an optimization we should expect to be handled automatically?
I'd say this is a fairly important missed optimization.
Val Markovic at 2017-05-08 21:51:34
It seems that this optimization simply doesn't exist in LLVM today. The IR we generate seems conductive to such a transformation: we're loading and comparing i8s (though this is partly due to a workaround for an optimizer bug), the loads have range metadata (and it's preserved through the optimization pipeline), the
&Fooarguments have thedereferencable(4)attribute, etc.However, there's an issue with thi soptimization: the dword pointers would be (potentially) unaligned, which is fine and probably carries little or no performance penalty on x86, but is a big problem on other platforms. So if you want portable performance, you can't rely on this optimization, at least not without forcing alignment of 4 on the struct.
This is probably also why LLVM doesn't do the optimization — its mid-level optimizers have only limited knowledge of how well unaligned loads and stores work, and so they're probably extremely averse to introducing them. From glancing over the source code, the "load combining" pass seems to bail out if the combined load would require alignment larger than the load that's being widened (in this case,
a.aandb.b).So one way we might be able to help LLVM with this is to emit larger alignments on memory operations when possible (e.g., align N on a load of the first field of an align(N) struct).
Hanna Kruppe at 2017-05-09 08:03:48
~~By the way, won't such optimization break in case of
Option<Foo>& similar?FooandOption<Foo>have the same size, soOptiondefinitely stores it's discriminant inFoo, then transmuting would also see the discriminant, doesn't it?~~Nevermind. If the value is
Some(_)then all thebools must be in the consistent0or1state, onlyNonecan be represented as1in a place where it can't be otherwise sinceNonedoesn't allow you to get&Foo. If you have a&Foothen the value must have a correct layout, includingbools being either1or0so just comparing the word should work correctly modulo what is said by others in this thread (e.g.: alignment)waffle at 2020-07-26 21:10:42
LLVM does generally perform this optimization (MergeICmps), the problem here is specifically the bool types involved. We're not comparing the bytes, but rather doing something like
(a != 0) == (b != 0).Nikita Popov at 2022-03-18 08:48:16
fixed since
rustc 1.82.0: https://rust.godbolt.org/z/vbbM6h3Mv@rustbot label +e-needs-test
Veera at 2025-02-25 01:00:41