vtable accesses optimization in a tight loop.
73be5a3
Opened by dpc at
This is a followup of #39992 .
This playrust:
#![crate_type="lib"]
use std::sync::Arc;
pub struct Wrapper {
x: usize,
y: usize,
t: Arc<Foo>,
}
pub trait Foo {
fn foo(&self);
}
pub fn test(foo: Wrapper) {
for _ in 0..200 {
foo.t.foo();
}
}
Generates the tight loop:
.LBB1_1:
incl %ebp
cmpl $200, %ebp
jge .LBB1_2
movq 16(%rbx), %rdi
movq 24(%rbx), %rax
leaq 15(%rdi), %rcx
negq %rdi
andq %rcx, %rdi
addq %r12, %rdi
.Ltmp12:
callq *%rax
.Ltmp13:
jmp .LBB1_1
while it seems to me the vtable access should be out of the loop, and it shouldn't be hard.
Strangely, changing the signature to take
&Wrapperdoes hoist the vtable access:.LBB0_1: inc ebp mov rdi, rbx call r14 cmp ebp, 200 jl .LBB0_1Hanna Kruppe at 2017-03-07 10:41:25
Looks like the load is hoisted on nightly (but not beta):
.LBB2_1: # =>This Inner Loop Header: Depth=1 addl $1, %ebp cmpl $200, %ebp jae .LBB2_2 # %bb.4: # in Loop: Header=BB2_1 Depth=1 movq %rbx, %rdi callq *%r13 jmp .LBB2_1The
&Wrappervariant still generates slightly better code:.LBB0_1: # =>This Inner Loop Header: Depth=1 movq %rbx, %rdi callq *%r14 addl $-1, %ebp jne .LBB0_1Nikita Popov at 2018-12-02 20:04:48