write!(wr,"foo") is 10% to 72% slower than wr.write("foo".as_bytes())
This example demonstrates that the trivial case of write!(wr, "foo") is much slower than calling wr.write("foo".as_bytes()):
extern mod extra;
use std::io::mem::MemWriter;
use extra::test::BenchHarness;
#[bench]
fn bench_write_value(bh: &mut BenchHarness) {
bh.iter(|| {
let mut mem = MemWriter::new();
for _ in range(0, 1000) {
mem.write("abc".as_bytes());
}
});
}
#[bench]
fn bench_write_ref(bh: &mut BenchHarness) {
bh.iter(|| {
let mut mem = MemWriter::new();
let wr = &mut mem as &mut Writer;
for _ in range(0, 1000) {
wr.write("abc".as_bytes());
}
});
}
#[bench]
fn bench_write_macro1(bh: &mut BenchHarness) {
bh.iter(|| {
let mut mem = MemWriter::new();
let wr = &mut mem as &mut Writer;
for _ in range(0, 1000) {
write!(wr, "abc");
}
});
}
#[bench]
fn bench_write_macro2(bh: &mut BenchHarness) {
bh.iter(|| {
let mut mem = MemWriter::new();
let wr = &mut mem as &mut Writer;
for _ in range(0, 1000) {
write!(wr, "{}", "abc");
}
});
}
With no optimizations:
running 4 tests
test bench_write_macro1 ... bench: 280153 ns/iter (+/- 73615)
test bench_write_macro2 ... bench: 322462 ns/iter (+/- 24886)
test bench_write_ref ... bench: 79974 ns/iter (+/- 3850)
test bench_write_value ... bench: 78709 ns/iter (+/- 4003)
test result: ok. 0 passed; 0 failed; 0 ignored; 4 measured
With --opt-level=3:
running 4 tests
test bench_write_macro1 ... bench: 62397 ns/iter (+/- 5485)
test bench_write_macro2 ... bench: 80203 ns/iter (+/- 3355)
test bench_write_ref ... bench: 55275 ns/iter (+/- 5156)
test bench_write_value ... bench: 56273 ns/iter (+/- 7591)
test result: ok. 0 passed; 0 failed; 0 ignored; 4 measured
Is there anything we can do to improve this? I can think of a couple options, but I bet there are more:
- Special case no-argument
write!to compile down intowr.write("foo".as_bytes()). If we go this route, it'd be nice to also convert a series of strwrite!("foo {} {}", "bar", "baz"). - Revive
wr.write_str("foo"). From what I understand, that's being blocked on #6164. - Figure out why llvm isn't able to optimize away the
write!overhead. Are there functions that should be getting inlined that are not? My scatter shot attempt didn't get any results.
I suspect that
fn bench_write_refdoes not make virtual calls. Adding this#[inline(never)] fn writer_write(w: &mut Writer, b: &[u8]) { w.write(b); } #[bench] fn bench_write_virt(bh: &mut BenchHarness) { bh.iter(|| { let mut mem = MemWriter::new(); let wr = &mut mem as &mut Writer; for _ in range(0, 1000) { writer_write(wr, "abc".as_bytes()); } }); }I have, with
--opt-level 3running 5 tests test bench_write_macro1 ... bench: 680823 ns/iter (+/- 34497) test bench_write_macro2 ... bench: 950790 ns/iter (+/- 72309) test bench_write_ref ... bench: 505846 ns/iter (+/- 41965) test bench_write_value ... bench: 511815 ns/iter (+/- 36681) test bench_write_virt ... bench: 553466 ns/iter (+/- 43716)So, the virtual call have an impact, but it does not explain totally the slowness of
write!()shown by this bench.Guillaume P. at 2013-12-02 08:56:08
Visiting for bug triage. This seems to still be present. The code to run the benchmarks is now:
extern crate test; use std::io::MemWriter; use test::Bencher; #[bench] fn bench_write_value(bh: &mut Bencher) { bh.iter(|| { let mut mem = MemWriter::new(); for _ in range(0u, 1000) { mem.write("abc".as_bytes()); } }); } #[bench] fn bench_write_ref(bh: &mut Bencher) { bh.iter(|| { let mut mem = MemWriter::new(); let wr = &mut mem as &mut Writer; for _ in range(0u, 1000) { wr.write("abc".as_bytes()); } }); } #[bench] fn bench_write_macro1(bh: &mut Bencher) { bh.iter(|| { let mut mem = MemWriter::new(); let wr = &mut mem as &mut Writer; for _ in range(0u, 1000) { write!(wr, "abc"); } }); } #[bench] fn bench_write_macro2(bh: &mut Bencher) { bh.iter(|| { let mut mem = MemWriter::new(); let wr = &mut mem as &mut Writer; for _ in range(0u, 1000) { write!(wr, "{}", "abc"); } }); }Without optimizations:
running 4 tests test bench_write_macro1 ... bench: 1470468 ns/iter (+/- 291966) test bench_write_macro2 ... bench: 1799612 ns/iter (+/- 316293) test bench_write_ref ... bench: 1336574 ns/iter (+/- 251664) test bench_write_value ... bench: 1317880 ns/iter (+/- 254668) test result: ok. 0 passed; 0 failed; 0 ignored; 4 measuredWith
--opt-level=3:running 4 tests test bench_write_macro1 ... bench: 127671 ns/iter (+/- 1452) test bench_write_macro2 ... bench: 196158 ns/iter (+/- 2053) test bench_write_ref ... bench: 43881 ns/iter (+/- 453) test bench_write_value ... bench: 43859 ns/iter (+/- 336) test result: ok. 0 passed; 0 failed; 0 ignored; 4 measuredNick Hamann at 2014-08-06 23:35:03
Still an issue, using the following code:
#![allow(unused_must_use)] #![feature(test)] extern crate test; use std::io::Write; use std::vec::Vec; use test::Bencher; #[bench] fn bench_write_value(bh: &mut Bencher) { bh.iter(|| { let mut mem = Vec::new(); for _ in 0..1000 { mem.write("abc".as_bytes()); } }); } #[bench] fn bench_write_ref(bh: &mut Bencher) { bh.iter(|| { let mut mem = Vec::new(); let wr = &mut mem as &mut Write; for _ in 0..1000 { wr.write("abc".as_bytes()); } }); } #[bench] fn bench_write_macro1(bh: &mut Bencher) { bh.iter(|| { let mut mem = Vec::new(); let wr = &mut mem as &mut Write; for _ in 0..1000 { write!(wr, "abc"); } }); } #[bench] fn bench_write_macro2(bh: &mut Bencher) { bh.iter(|| { let mut mem = Vec::new(); let wr = &mut mem as &mut Write; for _ in 0..1000 { write!(wr, "{}", "abc"); } }); }Typically giving something like the following:
$ cargo bench running 4 tests test bench_write_macro1 ... bench: 21,604 ns/iter (+/- 82) test bench_write_macro2 ... bench: 29,273 ns/iter (+/- 85) test bench_write_ref ... bench: 1,396 ns/iter (+/- 387) test bench_write_value ... bench: 1,391 ns/iter (+/- 163)Deleted user at 2015-10-30 01:15:07
I got approximately the same results with rust nightly.
ariasuni at 2017-06-23 01:42:02
Just for the record, running with
v1.20.0-nightly:$ cargo bench running 4 tests test bench_write_macro1 ... bench: 36,556 ns/iter (+/- 69) test bench_write_macro2 ... bench: 54,377 ns/iter (+/- 958) test bench_write_ref ... bench: 13,730 ns/iter (+/- 24) test bench_write_value ... bench: 13,755 ns/iter (+/- 81)Ritiek Malhotra at 2017-07-14 15:50:39
Today:
running 4 tests test bench_write_macro1 ... bench: 16,220 ns/iter (+/- 982) test bench_write_macro2 ... bench: 25,542 ns/iter (+/- 2,220) test bench_write_ref ... bench: 4,889 ns/iter (+/- 314) test bench_write_value ... bench: 4,819 ns/iter (+/- 956)Steve Klabnik at 2018-09-24 16:50:48
Two years later:
running 4 tests test bench_write_macro1 ... bench: 17,561 ns/iter (+/- 174) test bench_write_macro2 ... bench: 23,285 ns/iter (+/- 2,771) test bench_write_ref ... bench: 3,234 ns/iter (+/- 194) test bench_write_value ... bench: 3,238 ns/iter (+/- 123) test result: ok. 0 passed; 0 failed; 0 ignored; 4 measured; 0 filtered out ❯ rustc +nightly --version rustc 1.47.0-nightly (30f0a0768 2020-08-18)Steve Klabnik at 2020-08-26 19:36:34
I was curious if the difference was in the implementation of io::Write vs. fmt::Write, or in the details of write_fmt requiring format_args!, which is relevant since write! can be asked to "serve" either case.
To discover more information on this, I compared Vec to String, a seemingly "apples to apples...ish" comparison. https://gist.github.com/workingjubilee/2d2e3a7fded1c2101aafb51dc79a7ec5
running 10 tests test string_write_fmt ... bench: 10,053 ns/iter (+/- 1,141) test string_write_macro1 ... bench: 10,177 ns/iter (+/- 2,363) test string_write_macro2 ... bench: 17,499 ns/iter (+/- 1,847) test string_write_ref ... bench: 2,270 ns/iter (+/- 265) test string_write_value ... bench: 2,333 ns/iter (+/- 126) test vec_write_fmt ... bench: 15,722 ns/iter (+/- 1,673) test vec_write_macro1 ... bench: 15,767 ns/iter (+/- 1,638) test vec_write_macro2 ... bench: 23,968 ns/iter (+/- 8,942) test vec_write_ref ... bench: 2,296 ns/iter (+/- 178) test vec_write_value ... bench: 2,230 ns/iter (+/- 235) test result: ok. 0 passed; 0 failed; 0 ignored; 10 measured; 0 filtered outJubilee at 2020-09-05 23:54:43
I found it interesting that this example revealed that String actually had less overhead than Vec on
write!(the benchmark had high variance but I considered it indicative enough).Then I looked and saw:
fn write_fmt(&mut self, fmt: fmt::Arguments<'_>) -> Result<()> { // Create a shim which translates an io::Write to a fmt::Write and saves // off I/O errors. instead of discarding them struct Adaptor<'a, T: ?Sized + 'a> { inner: &'a mut T, error: Result<()>, } /* More code related to implementing and using the resulting shim, * seemingly involving a lot of poking a reference at runtime??? */ }So I am no longer surprised. I believe it is possible to fast-path some cases, but this is not actually an apples-to-apples comparison here. The orange is
write_fmt, which has to use a variety of formatting machinery. What needs to happen is thatwrite!should skipwrite_fmtin simple cases where the formatting machinery is not required, or the formatting machinery should itself be much faster in the case of the obvious fast path.Jubilee at 2020-09-07 22:19:46
(checking oldest issues for fun and profit)
Did
fmt::Arguments::as_strimprove the situation here?Crystal Durham at 2022-07-22 16:56:01
Not really (
nightly 2022-07-23):running 5 tests test string_write_fmt ... bench: 6,953 ns/iter (+/- 161) test string_write_macro1 ... bench: 7,034 ns/iter (+/- 227) test string_write_macro2 ... bench: 9,308 ns/iter (+/- 760) test string_write_ref ... bench: 688 ns/iter (+/- 32) test string_write_value ... bench: 673 ns/iter (+/- 34)Jakub Beránek at 2022-08-10 15:27:57
It seems to be just because nobody bothered to yet. https://gist.github.com/CAD97/2922d3132777310c1e22564f6c6dabdf
running 8 tests test new_string_write_macro1 ... bench: 1,875 ns/iter (+/- 228) test new_string_write_macro2 ... bench: 13,594 ns/iter (+/- 999) test new_vec_write_macro1 ... bench: 1,637 ns/iter (+/- 175) test new_vec_write_macro2 ... bench: 13,808 ns/iter (+/- 1,320) test old_string_write_macro1 ... bench: 10,080 ns/iter (+/- 1,618) test old_string_write_macro2 ... bench: 13,880 ns/iter (+/- 1,327) test old_vec_write_macro1 ... bench: 10,037 ns/iter (+/- 779) test old_vec_write_macro2 ... bench: 14,783 ns/iter (+/- 1,280)--- old/src/std/io/mod.rs +++ new/src/std/io/mod.rs @@ -1650,8 +1650,12 @@ #[stable(feature = "rust1", since = "1.0.0")] fn write_fmt(&mut self, fmt: fmt::Arguments<'_>) -> Result<()> { + if let Some(s) = fmt.as_str() { + return self.write_all(s.as_bytes()); + } + // Create a shim which translates a Write to a fmt::Write and saves // off I/O errors. instead of discarding them struct Adapter<'a, T: ?Sized + 'a> { inner: &'a mut T, error: Result<()>, } --- old/src/core/fmt/mod.rs +++ new/src/core/fmt/mod.rs @@ -186,4 +186,8 @@ #[stable(feature = "rust1", since = "1.0.0")] fn write_fmt(mut self: &mut Self, args: Arguments<'_>) -> Result { + if let Some(s) = args.as_str() { + self.write_str(s) + } else { - write(&mut self, args) + write(&mut self, args) + } }Crystal Durham at 2022-08-11 10:05:15
It doesn't solve all the cases, but this definitely sounds like a nice improvement, that could be done in parallel with the ongoing
fmtrewrite attempts. Will you send a PR? :)Jakub Beránek at 2022-08-11 10:11:34
ten years later: rustc 1.76.0-nightly (f967532a4 2023-12-08)
test bench_write_macro1 ... bench: 6,062 ns/iter (+/- 41) test bench_write_macro2 ... bench: 6,070 ns/iter (+/- 233) test bench_write_ref ... bench: 838 ns/iter (+/- 49) test bench_write_value ... bench: 813 ns/iter (+/- 7)kekeimiku at 2023-12-09 04:06:03
The macro1 and macro2 versions should be mostly identical now, thanks to format args flattening. Sadly, it's still not as fast as the non-macro version. Mara has recently posted an awesome blog post, where she explains the different trade-offs of format args macros. It's not practical to perfectly expand the macros everywhere, because while it could improve runtime performance, it could also hurt compile times and binary size a lot. That being said, in this simple case, where only a string is being written and only
Displayis used, it would be great if we could optimize it better.Jakub Beránek at 2023-12-09 06:49:29