Match blocks generate inefficient assembly compared to if/else chain since rust 1.65.0

31e6d4f
Opened by Mahmoud Al-Qudsi at 2023-04-27 17:31:30

I would assume that the following two snippets of code would compile to identical assembly

pub fn convert_hex(d: char) -> Option<u32> {
    if ('0'..='9').contains(&d) {
        Some(u32::from(d) - u32::from('0'))
    } else if ('A'..='Z').contains(&d) {
        Some(10 + u32::from(d) - u32::from('A'))
    } else {
        None
    }
}

pub fn convert_hex2(c: char) -> Option<u32> {
    match c {
        '0'..='9' => Some(u32::from(c) - u32::from('0')),
        'A'..='Z' => Some(10 + u32::from(c) - u32::from('A')),
        _ => None,
    }
}

However, as of rustc 1.65.0 convert_hex2 with the match blocks generates suboptimal code that is approximately 30% slower in a microbenchmark. Godbolt link showing asm comparison between 1.64.0 vs 1.69.0.

I don't think it's an LLVM regression since the emitted IR differs between 1.64 and 1.65: https://rust.godbolt.org/z/7b3EbGfj3

Version it worked on

It most recently worked on: 1.64.0

Version with regression

1.65.0-nightly (as of 1.69.0)

@rustbot modify labels: +regression-from-stable-to-stable -regression-untriaged +A-codegen +C-bug +T-compiler +I-slow

I'll try to bisect the commit behind the change.

  1. I would assume that the following two snippets of code would compile to identical assembly

    pub fn convert_hex(d: char) -> Option<u32> {
        if ('0'..='9').contains(&d) {
            Some(u32::from(d) - u32::from('0'))
        } else if ('A'..='Z').contains(&d) {
            Some(10 + u32::from(d) - u32::from('A'))
        } else {
            None
        }
    }
    
    pub fn convert_hex2(c: char) -> Option<u32> {
        match c {
            '0'..='9' => Some(u32::from(c) - u32::from('0')),
            'A'..='Z' => Some(10 + u32::from(c) - u32::from('A')),
            _ => None,
        }
    }
    

    However, as of rustc 1.65.0 convert_hex2 with the match blocks generates suboptimal code that is approximately 30% slower in a microbenchmark. Godbolt link showing asm comparison between 1.64.0 vs 1.69.0.

    I don't think it's an LLVM regression since the emitted IR differs between 1.64 and 1.65: https://rust.godbolt.org/z/7b3EbGfj3

    Version it worked on

    It most recently worked on: 1.64.0

    Version with regression

    1.65.0 - nightly (as of 1.69.0)
    

    @rustbot modify labels: +regression-from-stable-to-stable -regression-untriaged +A-codegen +C-bug +T-compiler +I-slow

    I'll try to bisect the commit behind the change.

    Mahmoud Al-Qudsi at 2023-04-23 19:31:48

  2. found 8 bors merge commits in the specified range
      commit[0] 2022-07-01: Auto merge of #93967 - cjgillot:short-struct-span, r=petrochenkov
      commit[1] 2022-07-01: Auto merge of #98781 - GuillaumeGomez:rollup-798kb8u, r=GuillaumeGomez
      commit[2] 2022-07-02: Auto merge of #98791 - cuviper:rogue-binary, r=compiler-errors
      commit[3] 2022-07-02: Auto merge of #98802 - Dylan-DPC:rollup-u6mwx27, r=Dylan-DPC
      commit[4] 2022-07-02: Auto merge of #91743 - cjgillot:enable_mir_inlining_inline_all, r=oli-obk
      commit[5] 2022-07-02: Auto merge of #97235 - nbdd0121:unwind, r=Amanieu
      commit[6] 2022-07-02: Auto merge of #97585 - lqd:const-alloc-intern, r=RalfJung
      commit[7] 2022-07-02: Auto merge of #98820 - RalfJung:rollup-i3mip9a, r=RalfJung
    ERROR: no CI builds available between 46b8c23f3eb5e4d0e0aa27eb3f20d5b8fc3ed51f and f2d93935ffba3ab9d7ccb5300771a2d29b4c8bf3 within last 167 days
    

    #91743 seems the most likely

    clubby789 at 2023-04-24 00:04:01

  3. I agree that #91743 is the most likely of those commits, but I can't see why inlining the MIR would result in such suboptimal codegen. I can see how inlining results in the two separate declarations that aren't merged but that doesn't explain why the extra memory offset calculation gets done. @cjgillot do you have any thoughts?

    Mahmoud Al-Qudsi at 2023-04-26 18:59:38

  4. WG-prioritization assigning priority (Zulip discussion).

    @rustbot label -I-prioritize +P-medium

    apiraino at 2023-04-27 10:29:45

  5. Doublechecked with -Zinline-mir=no and the regression indeed goes away.

    I can't see why inlining the MIR would result in such suboptimal codegen

    Inlining can lead to subtle changes in code structure that means opts run differently, or a pattern is no longer caught

    clubby789 at 2023-04-27 14:36:49