Benchmarks are run when just executing tests, leading to very slooooow tests

b1d181a
Opened by Jake Goulding at 2023-10-06 04:29:28

In my benchmarks, I generate some non-trivial sized blobs of data. Recently, my regular test runs have been very slow, and I believe it's because the benchmarks are running even when not passing --bench

bench.rs

#![feature(test)]

extern crate test;
use std::iter;

#[bench]
fn slow(b: &mut test::Bencher) {
    // This is dog-slow without optimizations...
    let s: String = iter::repeat("a").take(5*1024*1024).collect();
    println!("I made a string");
    b.iter(|| 1 + 1);
    b.bytes = s.len() as u64;
}

And running it:

$ rustc --test bench.rs
$ ./bench --nocapture

running 1 test
I made a string
test slow ... ok

test result: ok. 1 passed; 0 failed; 0 ignored; 0 measured

Even running with --test still runs these very slow tests.

  1. ping @huonw, as we talked a bit about this on IRC.

    Jake Goulding at 2015-05-11 00:49:02

  2. The benchmark should only be actually run once, not multiple times as --bench does. I made the change under the assumption that --bench functions will be relatively quick to run, since they are executed many times. Admittedly, I hadn't considered that tests are run without optimisations by default (and no-opt being very slow seems to be the fundamental issue here).

    Huon Wilson at 2015-05-11 02:48:17

  3. bench functions will be relatively quick to run, since they are executed many times

    And in my case, I do some potentially expensive setup in each benchmark. I'm careful to do that outside the iter call so it doesn't affect my benched time, but I don't really want to pay that for nothing.

    I don't understand the original decision behind the change - none of my benchmarks have any assertions, so running them with the tests seems not useful. What am I missing?

    Jake Goulding at 2015-05-11 03:40:48

  4. NB. even the function marked #[bench] (i.e. not just iter's closure) will be executed several times when running as a benchmark.

    I don't understand the original decision behind the change - none of my benchmarks have any assertions, so running them with the tests seems not useful. What am I missing?

    They can still trigger assertions if they're incorrect, e.g. indexing out of bounds. The motivation was https://github.com/rust-lang/rust/issues/15842.

    Huon Wilson at 2015-05-11 03:53:05

  5. Maybe move benchmarks into cargo's benches directory? I assume those won't run during cargo test.

    bluss at 2015-05-11 10:41:12

  6. move benchmarks into cargo's benches directory

    Wouldn't that impose the public / private visibility rules? I'd be sad If I couldn't get fine-grained benchmarks...

    Jake Goulding at 2015-05-11 13:30:16

  7. For what it's worth, I have a workaround as I group all my benchmarks into a bench module, separate from my test module. I can run cargo test 'test::', which is inelegant but gets the job done for now.

    I also don't know that this is a high-priority issue since benchmarking isn't a stable feature. However, I have personal interest in helping fix it. :-)

    One idea I have would be to redirect the concept of bytes. Maybe Bencher could have a "suggested" byte size:

    #[bench]
    fn slow(b: &mut test::Bencher) {
        let s: String = iter::repeat("a").take(b.suggested_len).collect();
        b.iter(|| 1 + 1);
        b.bytes = s.len() as u64;
    }
    

    This could be set to a small value in test runs, and provide the side feature of allowing the benchmarking to automatically scale out different test sizes (1, 10, 100, 1000, etc).

    Another would be to have a flag denoting if benching is actually happening:

    #[bench]
    fn slow(b: &mut test::Bencher) {
        let size = if b.is_benching { 5 * 1024 * 1024 } else { 1 };
        let s: String = iter::repeat("a").take(size).collect();
        b.iter(|| 1 + 1);
        b.bytes = s.len() as u64;
    }
    

    Both of these feel hacky though...

    Jake Goulding at 2015-05-11 13:37:57

  8. FWIW, I just encountered an instance where a benchmark was triggering integer overflow in my library that was only picked up by cargo test, since cargo bench disables debug-assertions by default. Maybe this indicates my tests aren't good enough, but still, it seems valuable to attack correctness from as many sides as are available.

    That said, I think I'd be happy to switch this to opt-in (although I would be opting-in essentially always). Alternatively, we could provide an opt-out flag.

    Huon Wilson at 2015-05-16 11:28:36

  9. even the function marked #[bench](i.e. not just iter's closure) will be executed several times when running as a benchmark.

    This was not something that I was consciously aware of, although it makes sense with what I have read about how the benchmarking tool works. I guess it's a good thing that the benching functionality is unstable, as there does seem to be some awkward corner cases :smile_cat:

    a benchmark was triggering integer overflow in my library

    It's hard to argue with results, I suppose

    it seems valuable to attack correctness from as many sides as are available

    I know you don't mean it this way, but there's a potential slippery slope here. I like using focused tools for specific jobs. There are tools like quickcheck or afl.rs that help stretch the input space of our code to find issues we didn't think about unit testing specifically. If it makes sense to use benchmarking tools for testing, does it also make sense to use testing tools for benchmarking?

    I'd be happy to switch this to opt-in

    I don't know if it's worth changing anything yet, especially based on a single person complaining. I probably wouldn't even be talking about this if I could generate a 5{KB,MB,GB} in a "quick" time.

    Jake Goulding at 2015-05-17 15:22:36

  10. Benchmarks still run even when run with --test.

    Chuck at 2016-09-30 10:31:26

  11. @shepmaster Do you think it'd be worth adding a --no-bench flag? That would effectively allow disable benchmarks when you want to.

    Mark Rousskov at 2017-05-14 14:58:21

  12. adding a --no-bench flag?

    It would certainly provide a more reusable knob to turn than my current "nest all benches in a module" approach, but I'd worry that there's still the surprise inherent with "Oh, #[bench] also means #[test], just it doesn't iterate a bunch".

    If the attribute were something like #[test(bench)], that might make it more obvious, helping show the need for such a flag.

    Jake Goulding at 2017-05-15 02:27:48

  13. We could certainly deprecate #[bench] and change it to #[test(bench)] though I worry that the latter still isn't quite clear enough... it would improve the situation. However, I'm also somewhat concerned about just how much annoyance deprecating #[bench] would cause, since I'd guess a large portion of the community uses it.

    Mark Rousskov at 2017-05-15 03:44:08

  14. To temporarily resolve this issue, I have done the following:

    In my cargo.toml added a benchmarks feature:

    ...
    [features]
    benchmarks = []
    ...
    

    Then, I add the following to my benchmarks module:

    #[cfg(all(feature = "benchmarks", test))]
    mod benchmarks {
        ...
    }
    

    When I just run cargo tests, the benchmarks aren't even compiled since the feature is not set. I can run cargo tests --features benchmarks or cargo bench --features benchmarks if I want the benchmarks compiled/run.

    Akshay Nanavati at 2017-06-03 00:06:05

  15. is there something like #[bench(no-test)] ?

    buckle2000 at 2018-05-11 10:18:29