128-bit integer division with remainder is not combined to a single operation

f4b7d09
Opened by Henning Ottesen at 2023-04-05 17:38:50

Rustc fails to combine div() and mod() operations with u128 to a single division with remainder on x64. The following code compiles down to separate calls to __udivti3() and __umodti3() when we would prefer a single call to __udivmodti4().

#![feature(i128_type, i128, test)]
extern crate test;

#[inline(never)]
fn divmod_u128(a: u128, b: u128) -> (u128, u128) {
    (a / b, a % b)
}

fn main() {
    println!("{:?}", divmod_u128(std::u128::MAX, test::black_box(1000)));
}

Related to #39078. Relevant in the Display implementation for i128 and u128.

  1. Note that the equivalent program in C:

    typedef struct {
      unsigned __int128 a;
      unsigned __int128 b;
    } UInt128Pair;
    
    UInt128Pair divmod(unsigned __int128 a, unsigned __int128 b) {
      UInt128Pair result;
      result.a = a / b;
      result.b = a % b;
      return result;
    }
    

    Does not generate __udivmodti4 either with clang 5:

    <details><summary>x86_64 assembly</summary>
    divmod:                                 # @divmod
            push    rbp
            push    r15
            push    r14
            push    r13
            push    r12
            push    rbx
            push    rax
            mov     r14, r8
            mov     r15, rcx
            mov     r12, rdx
            mov     r13, rsi
            mov     rbx, rdi
    
            mov     rdi, r13
            mov     rsi, r12
            mov     rdx, r15
            mov     rcx, r14
            call    __udivti3
            mov     qword ptr [rsp], rax    # 8-byte Spill
            mov     rbp, rdx
    
            mov     rdi, r13
            mov     rsi, r12
            mov     rdx, r15
            mov     rcx, r14
            call    __umodti3
    
            mov     qword ptr [rbx + 8], rbp
            mov     rcx, qword ptr [rsp]    # 8-byte Reload
            mov     qword ptr [rbx], rcx
            mov     qword ptr [rbx + 24], rdx
            mov     qword ptr [rbx + 16], rax
    
            mov     rax, rbx
            add     rsp, 8
            pop     rbx
            pop     r12
            pop     r13
            pop     r14
            pop     r15
            pop     rbp
            ret
    
    </details>

     

    clang (trunk) does not call __umodti3, but computes it using a % b == a - (a / b) * b 😰, which behavior we will likely pick up when upgrading to LLVM 5.0.1 or 6.0.0.

    <details><summary>x86_64 assembly</summary>
    divmod:                                 # @divmod
            push    r15
            push    r14
            push    r13
            push    r12
            push    rbx
            mov     r14, r8
            mov     rbx, rcx
            mov     r15, rdx
            mov     r12, rsi
            mov     r13, rdi
    
            mov     rdi, r12
            mov     rsi, r15
            mov     rdx, rbx
            mov     rcx, r14
            call    __udivti3
            mov     rcx, rax
            mov     rsi, rdx
    
            imul    r14, rcx
            mov     rax, rcx
            mul     rbx
            add     rdx, r14
            imul    rbx, rsi
            add     rbx, rdx
            sub     r12, rax
            sbb     r15, rbx
    
            mov     qword ptr [r13 + 8], rsi
            mov     qword ptr [r13], rcx
            mov     qword ptr [r13 + 16], r12
            mov     qword ptr [r13 + 24], r15
    
            mov     rax, r13
            pop     rbx
            pop     r12
            pop     r13
            pop     r14
            pop     r15
            ret
    
    </details>

     

    ICC 17 also won't combine the calls. GCC 7 does generate __udivmodti4. (GCC 6 still result in two function calls.)

    kennytm at 2017-09-13 16:40:09

  2. The saddest part of the story is probably the implementations of __udivti3 and __umodti3 both call __udivmodti4 internally...

    I guess its more up to LLVM to fix the issue than to Rust, as from what I saw in the LLVM IR specification there is no combined "divrem" LLVM instruction to give modulo and the quotient.

    est31 at 2017-09-13 19:21:27

  3. That’s disappointing. Is it possible to manually call __udivmodti4()?

    Henning Ottesen at 2017-09-13 20:06:51

  4. That would be rust-lang/rfcs#914.

    kennytm at 2017-09-13 20:22:30

  5. @henninglive sort of you could. My first impression: it would be more of a hack... Idk, what do others think about this?

    est31 at 2017-09-13 20:25:20

  6. Still not generating udivmodti4: https://rust.godbolt.org/z/p-CxYF This is doing udivti3 + mul + sub.

    Nikita Popov at 2019-10-31 20:38:51

  7. danlark1 opened an LLVM issue for this.

    Aaron Kutch at 2020-09-12 02:49:53