Constant let binding slower than const

6cc282f
Opened by antoyo at 2023-10-08 20:49:33

Hello. I am currently optimizing my des crate and I found out that using a let statement can be significantly slower than using a const statement.

For instance, when I run cargo bench with the const declaration here, I get:

running 1 test
test bench_decrypt ... bench:       1,621 ns/iter (+/- 15)

If I replace this const by a let, the performance is not as good:

running 1 test
test bench_decrypt ... bench:      18,203 ns/iter (+/- 99)

On another benchmark, I think it was 4 times slower with let.

I expected to have the same performance with a constant let and a const.

Instead, the let declaration is slower than a const.

Meta

rustc --version --verbose:

rustc 1.13.0-nightly (aef6971ca 2016-08-17)
binary: rustc
commit-hash: aef6971ca96be5f04291420cc773b8bfacb8b36d
commit-date: 2016-08-17
host: x86_64-unknown-linux-gnu
release: 1.13.0-nightly

Thanks to fix this issue.

  1. with let you’re making the compiler to put your big array on the stack, whereas with const it will likely be placed in read-only section of the binary. While there’s nothing inherently preventing us from upgrading this let to a const (through, e.g. constant propagation) but it is not entirely clear to me that such an optimisation would always be beneficial.

    Simonas Kazlauskas at 2016-08-25 22:45:15

  2. @nagisa shouldn't the array get promoted anyway, so the let and the const should produce very similar code?

    James Miller at 2016-08-26 01:59:26

  3. Ok, just checked and a let doesn't use the promoted constant. Presumably because it's not a reference to the constant, even though it's only used via a reference. Constant propagation would probably help, but maybe constant promotion should be improved to handle this kind of thing better?

    /cc @eddyb & @rust-lang/compiler

    James Miller at 2016-08-26 02:04:45

  4. @Aatch The first constant promotion pass has a semantic role (in rvalue promotion), so it has to happen early, before MIR borrow-checking. Doing another pass for aggregates specifically has come up (I think I talked about it with @nikomatsakis), the problem is that doing it naively can inhibit other optimizations.

    I believe that we should be able to work with field assignments instead of Rvalue::Aggregate, which would mean that constant propagation or some form of dataflow would be required to know the state before the first read from the local.

    On the upside, this would work really well with inlining and even arbitrary control-flow (we can extract a deterministic subset of a function and evaluate it speculatively with miri).

    Eduard-Mihai Burtescu at 2016-08-26 03:09:53

  5. @eddyb it might be worthwhile to just have a pass that promotes in pretty much just this case. I can see large arrays being a common cause of either stack exhaustion or performance issues compared to aggregates. Even if it's replaced by a more sophisticated analysis + transformation, it'd be a good short-term gain.

    That said, it seems that this issue isn't MIR-specific, as the old backend doesn't promote it either, so maybe it's not worth the effort on a short-term fix.

    James Miller at 2016-08-26 03:35:58

  6. @Aatch Sorry, I used "aggregate" in the MIR sense, which includes arrays. One thing to note is that you don't want to promote anything with [x; n] in it in most cases, as in-place initialization is almost always faster than copying the data from some other place in memory.

    Eduard-Mihai Burtescu at 2016-08-26 03:41:16

  7. @eddyb sure, but only when you need a copy, which, like in this case, you might not. The fact that you can change the let to a const and it 1) compile and 2) be faster suggests that there's a pattern we can look for and take advantage of.

    James Miller at 2016-08-26 04:03:05

  8. Can we do this in LLVM?

    Patrick Walton at 2016-08-27 19:18:19

  9. @pcwalton I know clang doesn't, but you would have to ask someone from LLVM to know why.

    Eduard-Mihai Burtescu at 2016-08-27 19:22:36

  10. Seems pretty unfortunate that this happens. It would be nice to get an answer to @pcwalton's question. We should also probably close #45126 as a dupe of this one.

    I wonder if such let bindings are used in some places of Stylo… @Manishearth Do you think we could have a Clippy lint for large immutable arrays on the stack?

    Cc @rust-lang/wg-codegen

    Anthony Ramine at 2018-04-02 15:53:39

  11. Well. We now offer considerable const evaluation prowess where people might find it useful, and you pay for big binary data too, just less at execution time. Doing this optimization via promotion to const would preferably vanish if optimizations for size were requested.

    However! The general class of optimizations that could hypothetically be applied on a similar basis is in fact what @llvm.invariant.start (per #22986) is intended to convey.

    Jubilee at 2022-07-18 09:03:06

  12. One thing I seem to not have mentioned is making sure you have let xs = &[...]; not let xs = [...];. With the & there, and when the array has no interior mutability, you should end up with a &'static array.

    In fact, I think that even &CONSTANT (or CONSTANT.method_taking_borrowed_self() or CONSTANT[i] etc.) still requires that borrow to be promoted to 'static to not end up making a copy of the constant on the stack.

    Eduard-Mihai Burtescu at 2022-07-18 14:07:41