Match blocks generate inefficient assembly compared to if/else chain since rust 1.65.0
I would assume that the following two snippets of code would compile to identical assembly
pub fn convert_hex(d: char) -> Option<u32> {
if ('0'..='9').contains(&d) {
Some(u32::from(d) - u32::from('0'))
} else if ('A'..='Z').contains(&d) {
Some(10 + u32::from(d) - u32::from('A'))
} else {
None
}
}
pub fn convert_hex2(c: char) -> Option<u32> {
match c {
'0'..='9' => Some(u32::from(c) - u32::from('0')),
'A'..='Z' => Some(10 + u32::from(c) - u32::from('A')),
_ => None,
}
}
However, as of rustc 1.65.0 convert_hex2 with the match blocks generates suboptimal code that is approximately 30% slower in a microbenchmark. Godbolt link showing asm comparison between 1.64.0 vs 1.69.0.
I don't think it's an LLVM regression since the emitted IR differs between 1.64 and 1.65: https://rust.godbolt.org/z/7b3EbGfj3
Version it worked on
It most recently worked on: 1.64.0
Version with regression
1.65.0-nightly (as of 1.69.0)
@rustbot modify labels: +regression-from-stable-to-stable -regression-untriaged +A-codegen +C-bug +T-compiler +I-slow
I'll try to bisect the commit behind the change.
I would assume that the following two snippets of code would compile to identical assembly
pub fn convert_hex(d: char) -> Option<u32> { if ('0'..='9').contains(&d) { Some(u32::from(d) - u32::from('0')) } else if ('A'..='Z').contains(&d) { Some(10 + u32::from(d) - u32::from('A')) } else { None } } pub fn convert_hex2(c: char) -> Option<u32> { match c { '0'..='9' => Some(u32::from(c) - u32::from('0')), 'A'..='Z' => Some(10 + u32::from(c) - u32::from('A')), _ => None, } }However, as of rustc 1.65.0
convert_hex2with thematchblocks generates suboptimal code that is approximately 30% slower in a microbenchmark. Godbolt link showing asm comparison between 1.64.0 vs 1.69.0.I don't think it's an LLVM regression since the emitted IR differs between 1.64 and 1.65: https://rust.godbolt.org/z/7b3EbGfj3
Version it worked on
It most recently worked on: 1.64.0
Version with regression
1.65.0 - nightly (as of 1.69.0)@rustbot modify labels: +regression-from-stable-to-stable -regression-untriaged +A-codegen +C-bug +T-compiler +I-slow
I'll try to bisect the commit behind the change.
Mahmoud Al-Qudsi at 2023-04-23 19:31:48
found 8 bors merge commits in the specified range commit[0] 2022-07-01: Auto merge of #93967 - cjgillot:short-struct-span, r=petrochenkov commit[1] 2022-07-01: Auto merge of #98781 - GuillaumeGomez:rollup-798kb8u, r=GuillaumeGomez commit[2] 2022-07-02: Auto merge of #98791 - cuviper:rogue-binary, r=compiler-errors commit[3] 2022-07-02: Auto merge of #98802 - Dylan-DPC:rollup-u6mwx27, r=Dylan-DPC commit[4] 2022-07-02: Auto merge of #91743 - cjgillot:enable_mir_inlining_inline_all, r=oli-obk commit[5] 2022-07-02: Auto merge of #97235 - nbdd0121:unwind, r=Amanieu commit[6] 2022-07-02: Auto merge of #97585 - lqd:const-alloc-intern, r=RalfJung commit[7] 2022-07-02: Auto merge of #98820 - RalfJung:rollup-i3mip9a, r=RalfJung ERROR: no CI builds available between 46b8c23f3eb5e4d0e0aa27eb3f20d5b8fc3ed51f and f2d93935ffba3ab9d7ccb5300771a2d29b4c8bf3 within last 167 days#91743 seems the most likely
clubby789 at 2023-04-24 00:04:01
I agree that #91743 is the most likely of those commits, but I can't see why inlining the MIR would result in such suboptimal codegen. I can see how inlining results in the two separate declarations that aren't merged but that doesn't explain why the extra memory offset calculation gets done. @cjgillot do you have any thoughts?
Mahmoud Al-Qudsi at 2023-04-26 18:59:38
WG-prioritization assigning priority (Zulip discussion).
@rustbot label -I-prioritize +P-medium
apiraino at 2023-04-27 10:29:45
Doublechecked with
-Zinline-mir=noand the regression indeed goes away.I can't see why inlining the MIR would result in such suboptimal codegen
Inlining can lead to subtle changes in code structure that means opts run differently, or a pattern is no longer caught
clubby789 at 2023-04-27 14:36:49