Intern strings in metadata
The lack of string interning causes the metadata for rlibs to massively bloat. libwinapi.rlib for example takes up 54MB, which is mostly due to strings being repeated needlessly, and exacerbated by the sheer length of many of the identifiers. Simple greps of the file indicate basically all identifiers being repeated multiple times. Even something as simple as a constant that is never referenced has its name repeated at least 3 times. Interning strings would have massive space savings.
cc @eddyb who helped in figuring this out.
It would certainly be interesting to look into this.
Michael Woerister at 2016-04-04 14:30:11
Constant names are repeated 3 times, once for DefKey, once for item.name, and once for AST. I hacked a patch to remove duplication in DefKey, and it saved 1 MB(2%) for winapi. Removing duplication in AST seems harder.
Seo Sanghyeon at 2016-04-11 16:26:07
This is now done for symbols: https://github.com/rust-lang/rust/blob/bef6ff618f17398775e9d8cd23a6f47d983869dc/compiler/rustc_metadata/src/rmeta/encoder.rs#L318-L331
bjorn3 at 2023-07-15 13:17:12