128-bit integer division with remainder is not combined to a single operation
Rustc fails to combine div() and mod() operations with u128 to a single division with remainder on x64. The following code compiles down to separate calls to __udivti3() and __umodti3() when we would prefer a single call to __udivmodti4().
#![feature(i128_type, i128, test)]
extern crate test;
#[inline(never)]
fn divmod_u128(a: u128, b: u128) -> (u128, u128) {
(a / b, a % b)
}
fn main() {
println!("{:?}", divmod_u128(std::u128::MAX, test::black_box(1000)));
}
Related to #39078. Relevant in the Display implementation for i128 and u128.
Note that the equivalent program in C:
typedef struct { unsigned __int128 a; unsigned __int128 b; } UInt128Pair; UInt128Pair divmod(unsigned __int128 a, unsigned __int128 b) { UInt128Pair result; result.a = a / b; result.b = a % b; return result; }Does not generate
<details><summary>x86_64 assembly</summary>__udivmodti4either with clang 5:
</details>divmod: # @divmod push rbp push r15 push r14 push r13 push r12 push rbx push rax mov r14, r8 mov r15, rcx mov r12, rdx mov r13, rsi mov rbx, rdi mov rdi, r13 mov rsi, r12 mov rdx, r15 mov rcx, r14 call __udivti3 mov qword ptr [rsp], rax # 8-byte Spill mov rbp, rdx mov rdi, r13 mov rsi, r12 mov rdx, r15 mov rcx, r14 call __umodti3 mov qword ptr [rbx + 8], rbp mov rcx, qword ptr [rsp] # 8-byte Reload mov qword ptr [rbx], rcx mov qword ptr [rbx + 24], rdx mov qword ptr [rbx + 16], rax mov rax, rbx add rsp, 8 pop rbx pop r12 pop r13 pop r14 pop r15 pop rbp retclang (trunk) does not call
<details><summary>x86_64 assembly</summary>__umodti3, but computes it usinga % b == a - (a / b) * b😰, which behavior we will likely pick up when upgrading to LLVM 5.0.1 or 6.0.0.
</details>divmod: # @divmod push r15 push r14 push r13 push r12 push rbx mov r14, r8 mov rbx, rcx mov r15, rdx mov r12, rsi mov r13, rdi mov rdi, r12 mov rsi, r15 mov rdx, rbx mov rcx, r14 call __udivti3 mov rcx, rax mov rsi, rdx imul r14, rcx mov rax, rcx mul rbx add rdx, r14 imul rbx, rsi add rbx, rdx sub r12, rax sbb r15, rbx mov qword ptr [r13 + 8], rsi mov qword ptr [r13], rcx mov qword ptr [r13 + 16], r12 mov qword ptr [r13 + 24], r15 mov rax, r13 pop rbx pop r12 pop r13 pop r14 pop r15 retICC 17 also won't combine the calls. GCC 7 does generate
__udivmodti4. (GCC 6 still result in two function calls.)kennytm at 2017-09-13 16:40:09
The saddest part of the story is probably the implementations of
__udivti3and__umodti3both call__udivmodti4internally...I guess its more up to LLVM to fix the issue than to Rust, as from what I saw in the LLVM IR specification there is no combined "divrem" LLVM instruction to give modulo and the quotient.
est31 at 2017-09-13 19:21:27
That’s disappointing. Is it possible to manually call
__udivmodti4()?Henning Ottesen at 2017-09-13 20:06:51
That would be rust-lang/rfcs#914.
kennytm at 2017-09-13 20:22:30
@henninglive sort of you could. My first impression: it would be more of a hack... Idk, what do others think about this?
est31 at 2017-09-13 20:25:20
Still not generating udivmodti4: https://rust.godbolt.org/z/p-CxYF This is doing udivti3 + mul + sub.
Nikita Popov at 2019-10-31 20:38:51
danlark1 opened an LLVM issue for this.
Aaron Kutch at 2020-09-12 02:49:53