Incremental compilation regression with num-bignum

b775037
Opened by Sebastian Neubauer at 2020-01-28 23:30:22

The num-bignum crate is slower by a factor 3-4 when compiling incrementally.

This crate supports cargo bench which allows for an easy comparison of the performance. Without incremental compilation (CARGO_INCREMENTAL=0 cargo bench):

test divide_0          ... bench:         867 ns/iter (+/- 17)
test divide_1          ... bench:      15,398 ns/iter (+/- 361)
test divide_2          ... bench:     770,862 ns/iter (+/- 14,336)
test fac_to_string     ... bench:       2,210 ns/iter (+/- 7)
test factorial_100     ... bench:       6,624 ns/iter (+/- 382)
test fib2_100          ... bench:       1,777 ns/iter (+/- 54)
test fib2_1000         ... bench:      33,224 ns/iter (+/- 1,143)
test fib2_10000        ... bench:   2,109,744 ns/iter (+/- 21,395)
test fib_100           ... bench:         871 ns/iter (+/- 30)
test fib_1000          ... bench:      15,811 ns/iter (+/- 467)
test fib_10000         ... bench:   1,037,026 ns/iter (+/- 32,844)
test fib_to_string     ... bench:         232 ns/iter (+/- 4)
test from_str_radix_02 ... bench:       2,671 ns/iter (+/- 50)
test from_str_radix_08 ... bench:       1,157 ns/iter (+/- 18)
test from_str_radix_10 ... bench:       1,348 ns/iter (+/- 61)
test from_str_radix_16 ... bench:       1,029 ns/iter (+/- 25)
test from_str_radix_36 ... bench:       1,069 ns/iter (+/- 38)
test hash              ... bench:      78,637 ns/iter (+/- 2,226)
test modpow            ... bench:  24,375,306 ns/iter (+/- 541,434)
test modpow_even       ... bench:  62,169,859 ns/iter (+/- 1,558,466)
test multiply_0        ... bench:         100 ns/iter (+/- 2)
test multiply_1        ... bench:      15,523 ns/iter (+/- 445)
test multiply_2        ... bench:     848,662 ns/iter (+/- 18,692)
test multiply_3        ... bench:   1,972,284 ns/iter (+/- 41,682)
test pow_bench         ... bench:   5,377,001 ns/iter (+/- 173,286)
test shl               ... bench:       4,543 ns/iter (+/- 166)
test shr               ... bench:       1,984 ns/iter (+/- 103)
test to_str_radix_02   ... bench:       2,196 ns/iter (+/- 65)
test to_str_radix_08   ... bench:         772 ns/iter (+/- 16)
test to_str_radix_10   ... bench:       6,922 ns/iter (+/- 209)
test to_str_radix_16   ... bench:         637 ns/iter (+/- 12)
test to_str_radix_36   ... bench:       6,851 ns/iter (+/- 190)

test result: ok. 0 passed; 0 failed; 0 ignored; 32 measured; 0 filtered out

     Running target/release/deps/gcd-c7301f4be4104d87

running 8 tests
test gcd_euclid_0064 ... bench:       9,621 ns/iter (+/- 380)
test gcd_euclid_0256 ... bench:      57,871 ns/iter (+/- 1,377)
test gcd_euclid_1024 ... bench:     268,900 ns/iter (+/- 6,397)
test gcd_euclid_4096 ... bench:   1,786,538 ns/iter (+/- 50,910)
test gcd_stein_0064  ... bench:       1,589 ns/iter (+/- 74)
test gcd_stein_0256  ... bench:       6,351 ns/iter (+/- 106)
test gcd_stein_1024  ... bench:      39,651 ns/iter (+/- 1,787)
test gcd_stein_4096  ... bench:     351,423 ns/iter (+/- 18,916)

With incremental compilation:

test divide_0          ... bench:       2,212 ns/iter (+/- 120)
test divide_1          ... bench:      46,185 ns/iter (+/- 1,319)
test divide_2          ... bench:   2,164,688 ns/iter (+/- 35,890)
test fac_to_string     ... bench:       3,051 ns/iter (+/- 41)
test factorial_100     ... bench:      18,034 ns/iter (+/- 427)
test fib2_100          ... bench:       6,455 ns/iter (+/- 224)
test fib2_1000         ... bench:     119,590 ns/iter (+/- 2,451)
test fib2_10000        ... bench:   6,462,893 ns/iter (+/- 129,404)
test fib_100           ... bench:       3,026 ns/iter (+/- 65)
test fib_1000          ... bench:      56,443 ns/iter (+/- 1,542)
test fib_10000         ... bench:   3,210,114 ns/iter (+/- 64,889)
test fib_to_string     ... bench:         433 ns/iter (+/- 40)
test from_str_radix_02 ... bench:       7,403 ns/iter (+/- 288)
test from_str_radix_08 ... bench:       2,741 ns/iter (+/- 99)
test from_str_radix_10 ... bench:       3,737 ns/iter (+/- 115)
test from_str_radix_16 ... bench:       2,830 ns/iter (+/- 78)
test from_str_radix_36 ... bench:       3,354 ns/iter (+/- 80)
test hash              ... bench:     140,451 ns/iter (+/- 2,757)
test modpow            ... bench:  78,250,953 ns/iter (+/- 1,400,794)
test modpow_even       ... bench: 191,208,389 ns/iter (+/- 10,505,051)
test multiply_0        ... bench:         396 ns/iter (+/- 12)
test multiply_1        ... bench:      45,034 ns/iter (+/- 1,049)
test multiply_2        ... bench:   3,110,243 ns/iter (+/- 114,081)
test multiply_3        ... bench:   7,193,430 ns/iter (+/- 498,651)
test pow_bench         ... bench:  13,125,197 ns/iter (+/- 1,299,791)
test shl               ... bench:       7,582 ns/iter (+/- 739)
test shr               ... bench:       3,056 ns/iter (+/- 58)
test to_str_radix_02   ... bench:       5,011 ns/iter (+/- 130)
test to_str_radix_08   ... bench:       1,877 ns/iter (+/- 29)
test to_str_radix_10   ... bench:       8,457 ns/iter (+/- 287)
test to_str_radix_16   ... bench:       1,401 ns/iter (+/- 45)
test to_str_radix_36   ... bench:       7,998 ns/iter (+/- 159)

test result: ok. 0 passed; 0 failed; 0 ignored; 32 measured; 0 filtered out

     Running target/release/deps/gcd-c7301f4be4104d87

running 8 tests
test gcd_euclid_0064 ... bench:      19,747 ns/iter (+/- 548)
test gcd_euclid_0256 ... bench:     119,759 ns/iter (+/- 1,850)
test gcd_euclid_1024 ... bench:     569,027 ns/iter (+/- 12,593)
test gcd_euclid_4096 ... bench:   4,087,824 ns/iter (+/- 538,444)
test gcd_stein_0064  ... bench:       6,511 ns/iter (+/- 2,291)
test gcd_stein_0256  ... bench:      25,980 ns/iter (+/- 1,552)
test gcd_stein_1024  ... bench:     138,574 ns/iter (+/- 5,185)
test gcd_stein_4096  ... bench:     989,603 ns/iter (+/- 42,293)

num-bigint: f656829 cargo: 0.26.0-nightly (1d6dfea44 2018-01-26) rustc: 1.25.0-nightly (27a046e93 2018-02-18)

  1. Table form with percentage differences

    |Test|Normal|Incremental|Difference| |----|------|-----------|----------| |divide_0|867|2,212|155%| |divide_1|15,398|46,185|200%| |divide_2|770,862|2,164,688|181%| |fac_to_string|2,210|3,051|38%| |factorial_100|6,624|18,034|172%| |fib2_100|1,777|6,455|263%| |fib2_1000|33,224|119,590|260%| |fib2_10000|2,109,744|6,462,893|206%| |fib_100|871|3,026|247%| |fib_1000|15,811|56,443|257%| |fib_10000|1,037,026|3,210,114|210%| |fib_to_string|232|433|87%| |from_str_radix_02|2,671|7,403|177%| |from_str_radix_08|1,157|2,741|137%| |from_str_radix_10|1,348|3,737|177%| |from_str_radix_16|1,029|2,830|175%| |from_str_radix_36|1,069|3,354|214%| |gcd_euclid_0064|9,621|19,747|105%| |gcd_euclid_0256|57,871|119,759|107%| |gcd_euclid_1024|268,900|569,027|112%| |gcd_euclid_4096|1,786,538|4,087,824|129%| |gcd_stein_0064|1,589|6,511|310%| |gcd_stein_0256|6,351|25,980|309%| |gcd_stein_1024|39,651|138,574|249%| |gcd_stein_4096|351,423|989,603|182%| |hash|78,637|140,451|79%| |modpow|24,375,306|78,250,953|221%| |modpow_even|62,169,859|191,208,389|208%| |multiply_0|100|396|296%| |multiply_1|15,523|45,034|190%| |multiply_2|848,662|3,110,243|266%| |multiply_3|1,972,284|7,193,430|265%| |pow_bench|5,377,001|13,125,197|144%| |shl|4,543|7,582|67%| |shr|1,984|3,056|54%| |to_str_radix_02|2,196|5,011|128%| |to_str_radix_08|772|1,877|143%| |to_str_radix_10|6,922|8,457|22%| |to_str_radix_16|637|1,401|120%| |to_str_radix_36|6,851|7,998|17%|

    Peter Atashian at 2018-02-22 21:22:14

  2. @rust-lang/cargo, I don't think that cargo bench should use incremental compilation. The numbers one gets are not only slow, they also won't really tell you how a properly optimized build will perform.

    Michael Woerister at 2018-03-01 11:54:20

  3. @michaelwoerister: bench and run --release use equivalent options by default. If we want to change bench, we need to change run --release as well.

    That is, cargo bench gives you exactly the "properly optimized build", for the current definition of "properly optimized build".

    Alex Kladov at 2018-03-01 13:00:09

  4. Oh wait, we should not default to incremental compilation for release builds anyway. @Flakebi, did actively opt into incremental compilation somehow?

    Michael Woerister at 2018-03-01 13:25:38

  5. @michaelwoerister wait, looks like I am 100% wrong? I was sure we use incremental compilation for --release, but looks like we actually don't, so that probably means that other facts I am sure about are wrong....

    Alex Kladov at 2018-03-01 13:29:24

  6. Yeah, verified that Cargo indeed does not enable incremental compilation for both release and bench profiles.

    Alex Kladov at 2018-03-01 13:34:43

  7. so that probably means that other facts I am sure about are wrong....

    :D

    Michael Woerister at 2018-03-01 13:53:07

  8. Yes, I have set CARGO_INCREMENTAL=1 (for the second test)

    Sebastian Neubauer at 2018-03-01 14:01:44

  9. @Flakebi, OK, then I'd say this "works as expected". Incremental compilation does not provide the same runtime performance as non-incremental builds.

    @rust-lang/cargo, maybe Cargo could print a warning if cargo bench is invoked together with incremental compilation (which almost never what you really want).

    Michael Woerister at 2018-03-01 14:19:53

  10. Triage: I do not get any warnings while trying to do an incremental compilation with cargo bench

    Steve Klabnik at 2020-01-28 23:30:22