Trait object calls not devirtualized
trait A {
fn k(&self, x: i32) -> i32;
}
pub struct O {
}
pub struct P {
}
impl A for O {
fn k(&self, x: i32) -> i32 {
x
}
}
impl A for P {
fn k(&self, x: i32) -> i32 {
x
}
}
pub enum R {
P(P),
O(O),
}
impl R {
fn inner(&self) -> &A {
match self {
&R::P(ref p) => p,
&R::O(ref o) => o,
}
}
pub fn k(&self, x: i32) -> i32 {
match self {
&R::P(ref p) => p.k(x),
&R::O(ref o) => o.k(x),
}
}
pub fn j(&self, x: i32) -> i32 {
self.inner().k(x)
}
}
compiles to
core::ptr::drop_in_place:
push rbp
mov rbp, rsp
pop rbp
ret
core::ptr::drop_in_place:
push rbp
mov rbp, rsp
pop rbp
ret
<example::O as example::A>::k:
push rbp
mov rbp, rsp
mov eax, esi
pop rbp
ret
<example::P as example::A>::k:
push rbp
mov rbp, rsp
mov eax, esi
pop rbp
ret
example::R::k:
push rbp
mov rbp, rsp
mov eax, esi
pop rbp
ret
example::R::j:
push rbp
mov rbp, rsp
cmp byte ptr [rdi], 0
lea rdi, [rdi + 1]
lea rax, [rip + vtable.1]
lea rcx, [rip + vtable.0]
cmove rcx, rax
pop rbp
jmp qword ptr [rcx + 24]
vtable.0:
.quad core::ptr::drop_in_place
.quad 0
.quad 1
.quad <example::O as example::A>::k
vtable.1:
.quad core::ptr::drop_in_place
.quad 0
.quad 1
.quad <example::P as example::A>::k
Notice how much worse R::j() is then R::k()
It's somewhat not expected that non-generic trait object calls will be devirtualized -- if you want guarantees on that, you'll need to use generics. However, this does seem like a case where we can possibly do better.
cc @arielb1 -- possible LLVM issue
Mark Rousskov at 2017-11-05 01:26:55
How would you write j() with generics?
Jeff Muizelaar at 2017-11-05 02:43:46
I think LLVM just does never does this sort of split inlining - it never inlines a function unless it knows the exact call target, and it tends to merge, rather than split, similar code paths. Actually using generics to write
jwithout trait objects requires the closure to be generic over types (basicallyF: for<T: A> FnOnce(&T) -> R), which is not supported in today's Rust with closures and therefore requires an ugly manual trait implementation.I would either duplicate the code by hand or make
innera macro, as in:#![crate_type="rlib"] trait A { fn k(&self, x: i32) -> i32; } pub struct O { } pub struct P { } impl A for O { fn k(&self, x: i32) -> i32 { x } } impl A for P { fn k(&self, x: i32) -> i32 { x } } pub enum R { P(P), O(O), } // hack to work around the lack of generic closures, we don't // want to use a trait object here because performance is a // problem. macro_rules! with_inner { ($this:expr, $var:ident => $body:expr) => { match $this { &R::P(ref $var) => { $body } &R::O(ref $var) => { $body } } } } impl R { pub fn k(&self, x: i32) -> i32 { match self { &R::P(ref p) => p.k(x), &R::O(ref o) => o.k(x), } } pub fn j(&self, x: i32) -> i32 { with_inner!(self, a => a.k(x)) } } // Marker to allow seeing the assembly easily #[no_mangle] pub fn j_exported(r: &R, x: i32) -> i32 { r.j(x) }Ariel Ben-Yehuda at 2017-11-05 11:44:18
Triage: the codegen has changed, but the two functions are still different:
playground::R::k: # @playground::R::k # %bb.0: movl %esi, %eax retq # -- End function playground::R::j: # @playground::R::j # %bb.0: cmpb $1, (%rdi) leaq .Lanon.7823f3966c6de2987f3ac86f05619038.0(%rip), %rax leaq .Lanon.7823f3966c6de2987f3ac86f05619038.1(%rip), %rcx cmoveq %rax, %rcx addq $1, %rdi jmpq *24(%rcx) # TAILCALL # -- End functionSteve Klabnik at 2019-09-16 12:46:14
On
1.39.0(with-Copt-level=3) the codegen appears to do the right thing according to godbolt.org.The outputted ASM appears:
j_exported: mov eax, esi ret
This does not occur in the playground because, I don't believe you can change the optimization level.
Cody Laeder at 2019-12-09 19:22:40
@valarauca That example was showcasing how to avoid the problem in the first place. The original code still produces an indirect call.
Jonas Schievink at 2019-12-09 20:55:23
Things have changed a bit since the bug was opened, on 1.21.0 I believe, and from the last update from steveklabnik. Here's the godbolt for 1.50 with
-C opt-level=3.<example::O as example::A>::k: mov eax, esi ret example::R::k: mov eax, esi ret example::R::j: add rdi, 1 jmp qword ptr [rip + <example::O as example::A>::k@GOTPCREL]<P as A>::kseems to have been totally eliminated on account of being identical to<O as A>::k, butR::jis still a bit weird. Theaddwas presumably left behind from a previous version of the code, and ajmpwas used instead of<O as A>::kbeing inlined.The above is not great, but because of the elimination of
<P as A>::kby recent compiler versions, I think the code also no longer represents the problem very well. Making<O as A>::kand<P as A>::khave different code gives a better example. With the latter returningx + 10, you get (godbolt):<example::O as example::A>::k: mov eax, esi ret <example::P as example::A>::k: lea eax, [rsi + 10] ret example::R::k: lea eax, [rsi + 10] cmp byte ptr [rdi], 1 cmove eax, esi ret example::R::j: mov rax, rdi add rdi, 1 cmp byte ptr [rax], 1 jne .LBB3_1 mov rax, qword ptr [rip + <example::O as example::A>::k@GOTPCREL] jmp rax .LBB3_1: mov rax, qword ptr [rip + <example::P as example::A>::k@GOTPCREL] jmp raxIn the above case, it seems like it should have kept going with optimization -- inlining the calls and eventually optimizing down to the same thing as
R::k, ideally.On the other hand, in some cases, I might even hope for LLVM to replace the match with a virtualized call, depending on the number of arms and the complexity of the
A::kimplementations, etc. The match happens to be potentially very profitable in the example case, because theA::kimplementations are few and extremely simple, so inlining them would be a no-brainer. LLVM doesn't quite manage to get there though.Tyson Nottingham at 2021-02-20 00:26:03