Constant let binding slower than const
Hello.
I am currently optimizing my des crate and I found out that using a let statement can be significantly slower than using a const statement.
For instance, when I run cargo bench with the const declaration here, I get:
running 1 test
test bench_decrypt ... bench: 1,621 ns/iter (+/- 15)
If I replace this const by a let, the performance is not as good:
running 1 test
test bench_decrypt ... bench: 18,203 ns/iter (+/- 99)
On another benchmark, I think it was 4 times slower with let.
I expected to have the same performance with a constant let and a const.
Instead, the let declaration is slower than a const.
Meta
rustc --version --verbose:
rustc 1.13.0-nightly (aef6971ca 2016-08-17)
binary: rustc
commit-hash: aef6971ca96be5f04291420cc773b8bfacb8b36d
commit-date: 2016-08-17
host: x86_64-unknown-linux-gnu
release: 1.13.0-nightly
Thanks to fix this issue.
with
letyou’re making the compiler to put your big array on the stack, whereas withconstit will likely be placed in read-only section of the binary. While there’s nothing inherently preventing us from upgrading thisletto aconst(through, e.g. constant propagation) but it is not entirely clear to me that such an optimisation would always be beneficial.Simonas Kazlauskas at 2016-08-25 22:45:15
@nagisa shouldn't the array get promoted anyway, so the
letand theconstshould produce very similar code?James Miller at 2016-08-26 01:59:26
Ok, just checked and a
letdoesn't use the promoted constant. Presumably because it's not a reference to the constant, even though it's only used via a reference. Constant propagation would probably help, but maybe constant promotion should be improved to handle this kind of thing better?/cc @eddyb & @rust-lang/compiler
James Miller at 2016-08-26 02:04:45
@Aatch The first constant promotion pass has a semantic role (in rvalue promotion), so it has to happen early, before MIR borrow-checking. Doing another pass for aggregates specifically has come up (I think I talked about it with @nikomatsakis), the problem is that doing it naively can inhibit other optimizations.
I believe that we should be able to work with field assignments instead of
Rvalue::Aggregate, which would mean that constant propagation or some form of dataflow would be required to know the state before the first read from the local.On the upside, this would work really well with inlining and even arbitrary control-flow (we can extract a deterministic subset of a function and evaluate it speculatively with miri).
Eduard-Mihai Burtescu at 2016-08-26 03:09:53
@eddyb it might be worthwhile to just have a pass that promotes in pretty much just this case. I can see large arrays being a common cause of either stack exhaustion or performance issues compared to aggregates. Even if it's replaced by a more sophisticated analysis + transformation, it'd be a good short-term gain.
That said, it seems that this issue isn't MIR-specific, as the old backend doesn't promote it either, so maybe it's not worth the effort on a short-term fix.
James Miller at 2016-08-26 03:35:58
@Aatch Sorry, I used "aggregate" in the MIR sense, which includes arrays. One thing to note is that you don't want to promote anything with
[x; n]in it in most cases, as in-place initialization is almost always faster than copying the data from some other place in memory.Eduard-Mihai Burtescu at 2016-08-26 03:41:16
@eddyb sure, but only when you need a copy, which, like in this case, you might not. The fact that you can change the
letto aconstand it 1) compile and 2) be faster suggests that there's a pattern we can look for and take advantage of.James Miller at 2016-08-26 04:03:05
Can we do this in LLVM?
Patrick Walton at 2016-08-27 19:18:19
@pcwalton I know clang doesn't, but you would have to ask someone from LLVM to know why.
Eduard-Mihai Burtescu at 2016-08-27 19:22:36
Seems pretty unfortunate that this happens. It would be nice to get an answer to @pcwalton's question. We should also probably close #45126 as a dupe of this one.
I wonder if such
letbindings are used in some places of Stylo… @Manishearth Do you think we could have a Clippy lint for large immutable arrays on the stack?Cc @rust-lang/wg-codegen
Anthony Ramine at 2018-04-02 15:53:39
Well. We now offer considerable
constevaluation prowess where people might find it useful, and you pay for big binary data too, just less at execution time. Doing this optimization via promotion toconstwould preferably vanish if optimizations for size were requested.However! The general class of optimizations that could hypothetically be applied on a similar basis is in fact what
@llvm.invariant.start(per #22986) is intended to convey.Jubilee at 2022-07-18 09:03:06
One thing I seem to not have mentioned is making sure you have
let xs = &[...];notlet xs = [...];. With the&there, and when the array has no interior mutability, you should end up with a&'staticarray.In fact, I think that even
&CONSTANT(orCONSTANT.method_taking_borrowed_self()orCONSTANT[i]etc.) still requires that borrow to be promoted to'staticto not end up making a copy of the constant on the stack.Eduard-Mihai Burtescu at 2022-07-18 14:07:41