Match checking has quadratic average complexity
If you look at the trans-stats for some of our modules, some "big functions" really stand out:
rustc:
16838 insns, 67 ms, middle::trans::foreign::trans_intrinsic
15988 insns, 145 ms, middle::typeck::check::check_expr_with_unifier
11759 insns, 87 ms, middle::privacy::check_crate
libextra:
5244 insns, 31 ms, terminfo::parm::expand
5073 insns, 21 ms, time::do_strptime::parse_type
4068 insns, 23 ms, time::do_strftime::parse_type
3610 insns, 73 ms, terminfo::parser::compiled::parse
2169 insns, 89 ms, net_tcp::listen_common
1952 insns, 28 ms, getopts::getopts
1937 insns, 193 ms, net_tcp::connect
If you look at the code generating them, they don't immediately look like they should be taking 10x as long as other functions; but they do all seem to use match somewhat heavily. This suggests to me that we're still compiling matches in some way that causes a combinatorial explosion of basic blocks, or something.
Investigate!
/cc @toddaaro
emberian at 2013-08-13 06:04:42
/cc @msullivan I don't think it was actually toddaaro I was talking to about this...
emberian at 2013-08-13 06:05:10
I have whined a lot about match implementations that suffer from combinatorial explosion, so maybe that is why you thought of me.
This Scala issue is probably unrelated, but maybe reading about the experience will prove useful to a Rust developer someday.
https://issues.scala-lang.org/browse/SI-1133
Aaron Todd at 2013-08-21 17:16:08
Triage: did anyone determine if
matchwas still the issue here? I'd imagine codegen has gotten better since 2013...Steve Klabnik at 2015-01-20 19:59:02
@steveklabnik I don't know of any changes to
matchcodegen that would have alleviated this, re-running the measurements...emberian at 2015-01-20 20:11:53
Yes,
matchis still generating more code than anything else, by a significant margin.emberian at 2015-01-20 21:10:42
@cmr if you have a moment, can you explain how you checked this out? Future triage-rs will be thankful :smile:
Steve Klabnik at 2015-01-20 22:07:58
make RUSTFLAGS="-Z count-llvm-insns" | tee output, followed bysort -n < output > output2, then look at the bottom ofoutput2(ctrl-f for "trans_match")emberian at 2015-01-20 22:35:40
trans-ing a
matchappears to be quadratic in the number of arms (at least for integers and strings):$ for N in 500 1000 2000 4000; do echo $N python -c "print('fn main() { match 0 { %s _ => {} } }' % ''.join(' %s => {}' % n for n in range(0, $N)))" | rustc - -Z time-passes | grep translation done 500 time: 0.030; rss: 65MB translation 1000 time: 0.107; rss: 67MB translation 2000 time: 0.427; rss: 70MB translation 4000 time: 1.620; rss: 75MB translation(The python invocation prints code like
fn main() { match 0 { 0 => {} 1 => {} 2 => {} 3 => {} 4 => {} _ => {} } }with$N + 1arms.)Huon Wilson at 2016-01-05 05:41:09
Results of @huonw's script above with
rustc 1.11.0-nightly (34505e222 2016-06-08):500 time: 0.088; rss: 97MB translation 1000 time: 0.308; rss: 100MB translation 2000 time: 1.247; rss: 101MB translation 4000 time: 4.723; rss: 106MB translationWith
-Z orbit:500 time: 0.009; rss: 95MB translation 1000 time: 0.015; rss: 97MB translation 2000 time: 0.017; rss: 101MB translation 4000 time: 0.030; rss: 104MB translationMIR trans looks to be a lot better at this case.
Brian Anderson at 2016-06-10 06:36:59
@brson
MIR trans's algorithm is indeed very careful to generate a linear number of basic blocks.
Ariel Ben-Yehuda at 2016-06-10 22:55:50
Match checking currently takes the most time, which is rather unsurprising since I think it's O(n^2) code. So I'm going to leave this open for the time being, though realistically matches with this many arms are... excessive, but I plan on taking a look at the code and seeing if it's possible to optimize it a little soon, so I'll leave it open till then.
$ for N in {10..100}; do echo -n "$((2**$N)): "; python -c "print('fn main() { match 0 { %s _ => {} } }' % ''.join(' %s => {}' % n for n in xrange(0, 2 ** $N)))" | rustc - -Z time-passes | rg --color never 'match'; done 1024: time: 0.021; rss: 93MB match checking 2048: time: 0.083; rss: 96MB match checking 4096: time: 0.335; rss: 99MB match checking 8192: time: 1.340; rss: 107MB match checking 16384: time: 5.360; rss: 117MB match checking 32768: time: 21.446; rss: 142MB match checking 65536: time: 88.756; rss: 193MB match checking 131072: time: 380.206; rss: 300MB match checkingMark Rousskov at 2017-05-04 23:18:18
Actually, the current match checking code seems to be wrongly/poorly implemented. The original paper had linear performance on simple sequences here.
EDIT: I interpret the figures wrongly:
the X-axis shows the size of matchings squared, (expressed as source file size squared)
Tatsuyuki Ishi at 2018-04-22 07:37:36
I'd like to make it clear that match checking is a NP-Complete problem, and quadratic average time is not very bad for such complex algorithm.
However, it's true that it's taking up significant portion of compile time, and we probably want to special case trivial (C/switch-like) matches so that they use a loglinear algorithm instead.
Tatsuyuki Ishi at 2018-11-05 12:17:47
Triage:
~$ for N in 500 1000 2000 4000; do echo $N; python -c "print('fn main() { match 0 { %s _ => {} } }' % ''.join(' %s => {}' % n for n in range(0, $N)))" | rustc +devel - -Z time-passes | grep "checking 2"; done 500 time: 0.028 misc checking 2 1000 time: 0.087 misc checking 2 2000 time: 0.312 misc checking 2 4000 time: 1.220 misc checking 2~$ for N in 500 1000 2000 4000; do echo $N; python -c "print('fn main() { match 0 { %s _ => {} } }' % ''.join(' %s => {}' % n for n in range(0, $N)))" | rustc +devel - -Z time-passes | grep "match"; done 500 time: 0.029 rvalue promotion + match checking 1000 time: 0.087 rvalue promotion + match checking 2000 time: 0.313 rvalue promotion + match checking 4000 time: 1.233 rvalue promotion + match checkingEsteban Kuber at 2019-05-09 17:49:54
triage: P-medium.
This is not believed to be a high priority work-item; in practice these match checking times, even for large matches, are minuscule compared to other passes in the compiler.
Still, it might be a fun thing for a volunteer to attack.
Felix S Klock II at 2020-02-27 21:27:46
However, it's true that it's taking up significant portion of compile time, and we probably want to special case trivial (C/switch-like) matches so that they use a loglinear algorithm instead.
(Even this simpler step would probably be both easy and beneficial!)
Felix S Klock II at 2020-02-27 21:28:53
It's 2020 and
$ for K in {8..16}; do N=$((1 << K)) echo -n "$((N)): " python -c "print('fn main() { match 0 { %s _ => {} } }' % ''.join(' %s => {}' % n for n in xrange(0, $N)))" | rustc - -Z time-passes | rg --color never 'match' done 256: time: 0.001; rss: 88MB match_checking 512: time: 0.004; rss: 89MB match_checking 1024: time: 0.013; rss: 89MB match_checking # huh, rss is almost constant up to here 2048: time: 0.046; rss: 93MB match_checking 4096: time: 0.179; rss: 96MB match_checking 8192: time: 0.715; rss: 103MB match_checking 16384: time: 3.152; rss: 118MB match_checking 32768: time: 11.660; rss: 148MB match_checking 65536: time: 55.046; rss: 206MB match_checkingJubilee at 2020-09-05 21:44:52
The match exhaustiveness algorithm is indeed completely quadratic: for every branch, we check if it is redundant with any of the branches above. We could hack workarounds for some special cases like what #76918 did, but those usually stop being useful for non-trivial patterns, and I'm not sure they're worth it since most matches have only a few branches. (I would love some data to confirm that, if you want to help see https://github.com/rust-lang/rustc-perf/issues/792) A real solution would involve a pretty drastic redesign of the algorithm. I have been thinking about that in my spare time, but as @ishitatsuyuki mentioned, it's not easy.
Nadrieril at 2020-11-06 20:46:27
It's 2023 and
-Z time-passesno longer hastranslationormatch-checking. The only thing that's affected by huge matches isMIR_borrow_checking, which I assume includes match checking and lowering. If I run:for K in {8..16}; do N=$((1 << K)) echo -n "$((N)): " python -c "print('fn main() { match 0 { %s _ => {} } }' % ''.join(' %s => {}' % n for n in range(0, $N)))" | rustc - -Z time-passes 2>&1 | rg --color never 'MIR_borrow_checking' donewith
rustc 1.76.0-nightly (87e1447aa 2023-11-30)I get256: time: 0.002; rss: 58MB -> 62MB ( +3MB) MIR_borrow_checking 512: time: 0.005; rss: 59MB -> 64MB ( +5MB) MIR_borrow_checking 1024: time: 0.014; rss: 60MB -> 65MB ( +5MB) MIR_borrow_checking 2048: time: 0.049; rss: 62MB -> 69MB ( +7MB) MIR_borrow_checking 4096: time: 0.161; rss: 65MB -> 76MB ( +12MB) MIR_borrow_checking 8192: time: 0.599; rss: 72MB -> 92MB ( +20MB) MIR_borrow_checking 16384: time: 2.309; rss: 85MB -> 110MB ( +25MB) MIR_borrow_checking 32768: time: 9.270; rss: 114MB -> 161MB ( +47MB) MIR_borrow_checking 65536: time: 48.641; rss: 154MB -> 242MB ( +88MB) MIR_borrow_checkingwhich is still quadratic. I'd be curious to know if it's still match checking that drives this issue, how could I measure that?
If it is indeed match checking, then I'd like to close this as wontfix. I've done what I could to reduce this quadraticness (most notably https://github.com/rust-lang/rust/pull/117611) and I don't believe tackling this further would be worth it for typical matches (I have ideas but they would require a lot of allocations and increased code complexity).
Nadrieril at 2023-12-01 03:13:20
I think it's fine to close this as wontfix. Also, thanks for your work on rewriting the check passes! It's a quite complicated problem and I was surprised you were able to make it quite a bit faster.
Tatsuyuki Ishi at 2023-12-01 03:16:31
Thank you! I'll close it then, and someone can reopen if they feel it's relevant
Nadrieril at 2023-12-01 03:47:41
I've seen two people ran into what I think is this issue in real use cases. They involve matching over all possible kinds of game entity types / all existing sprites. I've previously reported this at #133945
Tim (Theemathas) Chirananthavat at 2025-03-11 03:24:22
Running the silly example with
-Zself-profile, I get for N=2^15 branches:+--------------------------------------------------------+-----------+-----------------+----------+------------+ | Item | Self time | % of total time | Time | Item count | +--------------------------------------------------------+-----------+-----------------+----------+------------+ | check_match | 3.16s | 79.678 | 3.16s | 1 | +--------------------------------------------------------+-----------+-----------------+----------+------------+ | mir_borrowck | 549.36ms | 13.843 | 553.29ms | 1 | +--------------------------------------------------------+-----------+-----------------+----------+------------+ | run_linker | 49.54ms | 1.248 | 49.54ms | 1 | +--------------------------------------------------------+-----------+-----------------+----------+------------+ | mir_built | 34.77ms | 0.876 | 3.20s | 1 |and for N=2^16 branches
--------------------------------------------------------+-----------+-----------------+----------+------------+ | Item | Self time | % of total time | Time | Item count | +--------------------------------------------------------+-----------+-----------------+----------+------------+ | check_match | 17.20s | 86.436 | 17.20s | 1 | +--------------------------------------------------------+-----------+-----------------+----------+------------+ | mir_borrowck | 2.16s | 10.838 | 2.16s | 1 | +--------------------------------------------------------+-----------+-----------------+----------+------------+ | mir_built | 90.85ms | 0.457 | 17.31s | 1 | +--------------------------------------------------------+-----------+-----------------+----------+------------+ | run_linker | 71.40ms | 0.359 | 71.40ms | 1 |So MIR lowering is roughly
O(N^1.5), borrowck isO(N^2), and match checking is betweenO(N^2)andO(N^2.5), with a worse constant factor :')Nadrieril at 2025-03-11 22:12:37
Reopening because it does feel like an issue, though I don't know how likely it is that we can fix it.
Nadrieril at 2025-03-11 22:22:23