Performance left on the table?
I've been playing with Rust optimizations today and discovered something curious: it looks like LLVM opt does a better job at optimizing Rust IR than rustc:
set -v
LINK=`rustc n_body.rs -O -o n_body-rustc -Zprint-link-args`
rustc n_body.rs --emit=llvm-ir -o n_body.ll #-O
opt n_body.ll -O3 -S -o n_body-opt.ll
#opt n_body-opt.ll -O3 -S -o n_body-opt.ll
llc n_body-opt.ll -relocation-model=pic -filetype=obj -o n_body-opt.o
eval ${LINK/n_body-rustc.0.o/n_body-opt.o} -o n_body-opt
time ./n_body-rustc 20000000
time ./n_body-opt 20000000
set +v
produces:
...
time ./n_body-rustc 20000000
-0.169075164
-0.169031665
real 0m3.876s
user 0m3.868s
sys 0m0.008s
time ./n_body-opt 20000000
-0.169075164
-0.169031665
real 0m3.626s
user 0m3.610s
sys 0m0.016s
(n_body.rs having been copied verbatim from benchmarksgame)
Adding -O to the second rustc invocation and running IR through opt twice pushes the time down even further:
...
time ./n_body-opt 20000000
-0.169075164
-0.169031665
real 0m3.147s
user 0m3.135s
sys 0m0.012s
This happens on both stable 1.11.0 and the current nightly. (x86_64-unknown-linux-gnu)
Does this mean that we could have -C opt-level=4?
Mostly a duplicate of https://github.com/rust-lang/rust/issues/33299 AFAICT.
Simonas Kazlauskas at 2016-09-02 19:34:57
optdoesn't use a custom pass list though. I can understand the reluctance to custom-tailor passes to Rust, but here we have a standard llvm tool doing better than rustc.cc @alexcrichton, @dotdash
vadimcn at 2016-09-02 21:35:06
@vadimcn then rustc does (or might have forgotten to populate the pass manager fully or does not configure the pass manager the same way). The list of passes ran by rustc and opt compiled against the same system LLVM differ (already kinda obvious from
rustc -O → opt -O2 → linkproducing different results fromrustc -O → link) byoptrunning SLP vectorizer,BTW rustc -O is a -Copt-level=2 AFAIR, so comparison between that and opt -O3 is kinda unfair.
(the fact that running something through opt twice changes (if it indeed does) anything also seems to be a bug, AFAIR its not supposed to be a case)
Simonas Kazlauskas at 2016-09-02 22:05:37
It's basically always been our intention to configure the LLVM pass manager in the same way that LLVM/clang do. This is susceptible to drift, though, across LLVM versions as idioms are updated but we don't.
I'd consider any difference between clang's pass manager and ours as a bug.
Alex Crichton at 2016-09-02 22:25:49
Okay, my bad: I forgot that -O == -Copt-level=2. Passing -Copt-level=3 makes
rustceven withopt.On the other hand, though, multiple
optpasses still beat that by ~15%.vadimcn at 2016-09-03 01:40:20
It isn't terribly surprising that throwing many additional optimization passes at it has some effect. Even -O3 is required to weigh performance wins against compile time losses, and even when individual passes are idempotent their interaction usually isn't. The more interesting question is whether there is a change that could be made so that one run of -O3 (that takes comparable time to one run of the status quo) reliably gives these improvements for more than one program. For example, it might just be a matter of throwing in an additional cleanup pass somewhere. But if there is no easier way than running (almost) the entire optimization pipeline, and it's only this specific (kind of) program that wins, it's probably not worthwhile.
Hanna Kruppe at 2016-09-03 07:24:55
So re-reading this issue a couple times leads me to believe that there's nothing particular to do here. We could change the pass order or add more passes, but without a concrete proposal as to what and how, I don't think this is super helpful. If anyone's interested in pursuing this, we should file an issue into internals.rust-lang.org and discuss there to get more eyes onto it.
Mark Rousskov at 2017-05-13 04:17:49
So, um, what's the motivation for closing this? I've created this issue to document my findings about potential perf improvements. I think that a project's bug database is the appropriate place to maintain such information, rather than some web forum. IMO, it is fine to have issues on the books that noone wants to work on immediately, as long as they contain useful information.
vadimcn at 2017-05-13 05:50:18
I'm fine with reopening; it just feels like this particular issue doesn't have anything we can do to fix it. That is, if we(/you) find other things we can do to improve LLVM's codegen, then opening other issues is better. This issue to me looked like it was purely about running passes more times, which to me seemed like not something we wanted to track. Again, though, if you disagree -- please reopen!
Mark Rousskov at 2017-05-13 12:30:55
I'm fine with reopening; it just feels like this particular issue doesn't have anything we can do to fix it.
I disagree: rustc does not have to use the default set of llvm passes. We could create our own, or add to the default one.
Now, it's true that this will require a lot of time and experimentation, and nobody expressed the desire to delve into it right now; however someone might still come along who is into this kind of thing (an intern project maybe?)
By closing such issues we are making them indistinguishable from truly fixed bugs and pretty much ensure that they will never be revisited.
I'd much rather keep them open but tagged appropriately. We already have I-Wishlist tag, could we use it here?cc: @brson
vadimcn at 2017-05-20 21:06:46
My understanding of the discussion was that we were worried that a custom pass list would create problems later down the road. However, I do see that we could get wins "now" so reopening.
Mark Rousskov at 2017-05-20 21:56:50
Triage: I agree with what @Mark-Simulacrum said back in 2017, and I think that the lack of movement in the last few years bears that out. I'll leave it to the compiler team to decide, I guess.
Steve Klabnik at 2020-05-08 21:30:58
There has been potentially some movement on the LLVM side with their new pass manager, but I don't know to what extent that has implications here.
I suspect that rolling our own pass list is likely to be fraught with difficulty and may even lead to things like miscompilation or hangs and such (because LLVM's tests no longer test what we do), but that's pure speculation.
Mark Rousskov at 2020-05-08 21:41:52