Match checking has quadratic average complexity

05f5ec3
Opened by Graydon Hoare at 2025-03-11 22:22:23

If you look at the trans-stats for some of our modules, some "big functions" really stand out:

rustc:

16838 insns, 67 ms, middle::trans::foreign::trans_intrinsic
15988 insns, 145 ms, middle::typeck::check::check_expr_with_unifier
11759 insns, 87 ms, middle::privacy::check_crate

libextra: 

5244 insns, 31 ms, terminfo::parm::expand
5073 insns, 21 ms, time::do_strptime::parse_type
4068 insns, 23 ms, time::do_strftime::parse_type
3610 insns, 73 ms, terminfo::parser::compiled::parse
2169 insns, 89 ms, net_tcp::listen_common
1952 insns, 28 ms, getopts::getopts
1937 insns, 193 ms, net_tcp::connect

If you look at the code generating them, they don't immediately look like they should be taking 10x as long as other functions; but they do all seem to use match somewhat heavily. This suggests to me that we're still compiling matches in some way that causes a combinatorial explosion of basic blocks, or something.

Investigate!

  1. /cc @toddaaro

    emberian at 2013-08-13 06:04:42

  2. /cc @msullivan I don't think it was actually toddaaro I was talking to about this...

    emberian at 2013-08-13 06:05:10

  3. I have whined a lot about match implementations that suffer from combinatorial explosion, so maybe that is why you thought of me.

    This Scala issue is probably unrelated, but maybe reading about the experience will prove useful to a Rust developer someday.

    https://issues.scala-lang.org/browse/SI-1133

    Aaron Todd at 2013-08-21 17:16:08

  4. Triage: did anyone determine if match was still the issue here? I'd imagine codegen has gotten better since 2013...

    Steve Klabnik at 2015-01-20 19:59:02

  5. @steveklabnik I don't know of any changes to match codegen that would have alleviated this, re-running the measurements...

    emberian at 2015-01-20 20:11:53

  6. Yes, match is still generating more code than anything else, by a significant margin.

    emberian at 2015-01-20 21:10:42

  7. @cmr if you have a moment, can you explain how you checked this out? Future triage-rs will be thankful :smile:

    Steve Klabnik at 2015-01-20 22:07:58

  8. make RUSTFLAGS="-Z count-llvm-insns" | tee output, followed by sort -n < output > output2, then look at the bottom of output2 (ctrl-f for "trans_match")

    emberian at 2015-01-20 22:35:40

  9. trans-ing a match appears to be quadratic in the number of arms (at least for integers and strings):

    $ for N in 500 1000 2000 4000; do
        echo $N
        python -c "print('fn main() { match 0 { %s _ => {} } }' % ''.join(' %s => {}' % n for n in range(0, $N)))" | 
            rustc - -Z time-passes |
            grep translation
      done
    500
    time: 0.030; rss: 65MB  translation
    1000
    time: 0.107; rss: 67MB  translation
    2000
    time: 0.427; rss: 70MB  translation
    4000
    time: 1.620; rss: 75MB  translation
    

    (The python invocation prints code like fn main() { match 0 { 0 => {} 1 => {} 2 => {} 3 => {} 4 => {} _ => {} } } with $N + 1 arms.)

    Huon Wilson at 2016-01-05 05:41:09

  10. Results of @huonw's script above with rustc 1.11.0-nightly (34505e222 2016-06-08):

    500
    time: 0.088; rss: 97MB  translation
    1000
    time: 0.308; rss: 100MB translation
    2000
    time: 1.247; rss: 101MB translation
    4000
    time: 4.723; rss: 106MB translation
    

    With -Z orbit:

    500
    time: 0.009; rss: 95MB  translation
    1000
    time: 0.015; rss: 97MB  translation
    2000
    time: 0.017; rss: 101MB translation
    4000
    time: 0.030; rss: 104MB translation
    

    MIR trans looks to be a lot better at this case.

    Brian Anderson at 2016-06-10 06:36:59

  11. @brson

    MIR trans's algorithm is indeed very careful to generate a linear number of basic blocks.

    Ariel Ben-Yehuda at 2016-06-10 22:55:50

  12. Match checking currently takes the most time, which is rather unsurprising since I think it's O(n^2) code. So I'm going to leave this open for the time being, though realistically matches with this many arms are... excessive, but I plan on taking a look at the code and seeing if it's possible to optimize it a little soon, so I'll leave it open till then.

    $ for N in {10..100}; do echo -n "$((2**$N)): "; python -c "print('fn main() { match 0 { %s _ => {} } }' % ''.join(' %s => {}' % n for n in xrange(0, 2 ** $N)))" | rustc - -Z time-passes | rg --color never 'match'; done
    1024: time: 0.021; rss: 93MB	match checking
    2048: time: 0.083; rss: 96MB	match checking
    4096: time: 0.335; rss: 99MB	match checking
    8192: time: 1.340; rss: 107MB	match checking
    16384: time: 5.360; rss: 117MB	match checking
    32768: time: 21.446; rss: 142MB	match checking
    65536: time: 88.756; rss: 193MB	match checking
    131072: time: 380.206; rss: 300MB	match checking
    

    Mark Rousskov at 2017-05-04 23:18:18

  13. Actually, the current match checking code seems to be wrongly/poorly implemented. The original paper had linear performance on simple sequences here.

    EDIT: I interpret the figures wrongly:

    the X-axis shows the size of matchings squared, (expressed as source file size squared)

    Tatsuyuki Ishi at 2018-04-22 07:37:36

  14. I'd like to make it clear that match checking is a NP-Complete problem, and quadratic average time is not very bad for such complex algorithm.

    However, it's true that it's taking up significant portion of compile time, and we probably want to special case trivial (C/switch-like) matches so that they use a loglinear algorithm instead.

    Tatsuyuki Ishi at 2018-11-05 12:17:47

  15. Triage:

    ~$ for N in 500 1000 2000 4000; do     echo $N;     python -c "print('fn main() { match 0 { %s _ => {} } }' % ''.join(' %s => {}' % n for n in range(0, $N)))" |          rustc +devel - -Z time-passes | grep "checking 2"; done
    500
      time: 0.028	misc checking 2
    1000
      time: 0.087	misc checking 2
    2000
      time: 0.312	misc checking 2
    4000
      time: 1.220	misc checking 2
    
    ~$ for N in 500 1000 2000 4000; do     echo $N;     python -c "print('fn main() { match 0 { %s _ => {} } }' % ''.join(' %s => {}' % n for n in range(0, $N)))" |          rustc +devel - -Z time-passes | grep "match"; done
    500
        time: 0.029	rvalue promotion + match checking
    1000
        time: 0.087	rvalue promotion + match checking
    2000
        time: 0.313	rvalue promotion + match checking
    4000
        time: 1.233	rvalue promotion + match checking
    

    Esteban Kuber at 2019-05-09 17:49:54

  16. triage: P-medium.

    This is not believed to be a high priority work-item; in practice these match checking times, even for large matches, are minuscule compared to other passes in the compiler.

    Still, it might be a fun thing for a volunteer to attack.

    Felix S Klock II at 2020-02-27 21:27:46

  17. However, it's true that it's taking up significant portion of compile time, and we probably want to special case trivial (C/switch-like) matches so that they use a loglinear algorithm instead.

    (Even this simpler step would probably be both easy and beneficial!)

    Felix S Klock II at 2020-02-27 21:28:53

  18. It's 2020 and

    $ for K in {8..16}; do
      N=$((1 << K))
      echo -n "$((N)): "
      python -c "print('fn main() { match 0 { %s _ => {} } }' % ''.join(' %s => {}' % n for n in xrange(0, $N)))" | 
        rustc - -Z time-passes | 
        rg --color never 'match'
    done
    256: time: 0.001; rss: 88MB	match_checking
    512: time: 0.004; rss: 89MB	match_checking
    1024: time: 0.013; rss: 89MB	match_checking # huh, rss is almost constant up to here
    2048: time: 0.046; rss: 93MB	match_checking
    4096: time: 0.179; rss: 96MB	match_checking
    8192: time: 0.715; rss: 103MB	match_checking
    16384: time: 3.152; rss: 118MB	match_checking
    32768: time: 11.660; rss: 148MB	match_checking
    65536: time: 55.046; rss: 206MB	match_checking
    

    Jubilee at 2020-09-05 21:44:52

  19. The match exhaustiveness algorithm is indeed completely quadratic: for every branch, we check if it is redundant with any of the branches above. We could hack workarounds for some special cases like what #76918 did, but those usually stop being useful for non-trivial patterns, and I'm not sure they're worth it since most matches have only a few branches. (I would love some data to confirm that, if you want to help see https://github.com/rust-lang/rustc-perf/issues/792) A real solution would involve a pretty drastic redesign of the algorithm. I have been thinking about that in my spare time, but as @ishitatsuyuki mentioned, it's not easy.

    Nadrieril at 2020-11-06 20:46:27

  20. It's 2023 and -Z time-passes no longer has translation or match-checking. The only thing that's affected by huge matches is MIR_borrow_checking, which I assume includes match checking and lowering. If I run:

    for K in {8..16}; do
      N=$((1 << K))
      echo -n "$((N)): "
      python -c "print('fn main() { match 0 { %s _ => {} } }' % ''.join(' %s => {}' % n for n in range(0, $N)))" |
        rustc - -Z time-passes 2>&1 |
        rg --color never 'MIR_borrow_checking'
    done
    

    with rustc 1.76.0-nightly (87e1447aa 2023-11-30) I get

    256: time:   0.002; rss:   58MB ->   62MB (   +3MB)     MIR_borrow_checking
    512: time:   0.005; rss:   59MB ->   64MB (   +5MB)     MIR_borrow_checking
    1024: time:   0.014; rss:   60MB ->   65MB (   +5MB)    MIR_borrow_checking
    2048: time:   0.049; rss:   62MB ->   69MB (   +7MB)    MIR_borrow_checking
    4096: time:   0.161; rss:   65MB ->   76MB (  +12MB)    MIR_borrow_checking
    8192: time:   0.599; rss:   72MB ->   92MB (  +20MB)    MIR_borrow_checking
    16384: time:   2.309; rss:   85MB ->  110MB (  +25MB)   MIR_borrow_checking
    32768: time:   9.270; rss:  114MB ->  161MB (  +47MB)   MIR_borrow_checking
    65536: time:  48.641; rss:  154MB ->  242MB (  +88MB)   MIR_borrow_checking
    

    which is still quadratic. I'd be curious to know if it's still match checking that drives this issue, how could I measure that?

    If it is indeed match checking, then I'd like to close this as wontfix. I've done what I could to reduce this quadraticness (most notably https://github.com/rust-lang/rust/pull/117611) and I don't believe tackling this further would be worth it for typical matches (I have ideas but they would require a lot of allocations and increased code complexity).

    Nadrieril at 2023-12-01 03:13:20

  21. I think it's fine to close this as wontfix. Also, thanks for your work on rewriting the check passes! It's a quite complicated problem and I was surprised you were able to make it quite a bit faster.

    Tatsuyuki Ishi at 2023-12-01 03:16:31

  22. Thank you! I'll close it then, and someone can reopen if they feel it's relevant

    Nadrieril at 2023-12-01 03:47:41

  23. I've seen two people ran into what I think is this issue in real use cases. They involve matching over all possible kinds of game entity types / all existing sprites. I've previously reported this at #133945

    Tim (Theemathas) Chirananthavat at 2025-03-11 03:24:22

  24. Running the silly example with -Zself-profile, I get for N=2^15 branches:

    +--------------------------------------------------------+-----------+-----------------+----------+------------+
    | Item                                                   | Self time | % of total time | Time     | Item count |
    +--------------------------------------------------------+-----------+-----------------+----------+------------+
    | check_match                                            | 3.16s     | 79.678          | 3.16s    | 1          |
    +--------------------------------------------------------+-----------+-----------------+----------+------------+
    | mir_borrowck                                           | 549.36ms  | 13.843          | 553.29ms | 1          |
    +--------------------------------------------------------+-----------+-----------------+----------+------------+
    | run_linker                                             | 49.54ms   | 1.248           | 49.54ms  | 1          |
    +--------------------------------------------------------+-----------+-----------------+----------+------------+
    | mir_built                                              | 34.77ms   | 0.876           | 3.20s    | 1          |
    

    and for N=2^16 branches

    --------------------------------------------------------+-----------+-----------------+----------+------------+
    | Item                                                   | Self time | % of total time | Time     | Item count |
    +--------------------------------------------------------+-----------+-----------------+----------+------------+
    | check_match                                            | 17.20s    | 86.436          | 17.20s   | 1          |
    +--------------------------------------------------------+-----------+-----------------+----------+------------+
    | mir_borrowck                                           | 2.16s     | 10.838          | 2.16s    | 1          |
    +--------------------------------------------------------+-----------+-----------------+----------+------------+
    | mir_built                                              | 90.85ms   | 0.457           | 17.31s   | 1          |
    +--------------------------------------------------------+-----------+-----------------+----------+------------+
    | run_linker                                             | 71.40ms   | 0.359           | 71.40ms  | 1          |
    

    So MIR lowering is roughly O(N^1.5), borrowck is O(N^2), and match checking is between O(N^2) and O(N^2.5), with a worse constant factor :')

    Nadrieril at 2025-03-11 22:12:37

  25. Reopening because it does feel like an issue, though I don't know how likely it is that we can fix it.

    Nadrieril at 2025-03-11 22:22:23