linking staticlib files into shared libraries exports all of std::

3f1939c
Opened by Nathan Froyd at 2022-12-07 20:52:26

Consider this toy example:

#[no_mangle]
pub fn hello() {
    println!("hello world")
}
extern "C" {
  void hello();
}

void
really_hello()
{
  hello();
}

Compile and link:

$ rustc --crate-type staticlib --emit link=sl.a sl.rs
$ g++ -o hello.so -fPIC -shared driver.cpp sl.a

With rust 1.8.0, we have:

$ ls -l hello.so
-rwxr-xr-x 1 froydnj froydnj 2141544 Apr 26 11:02 hello.so

which is quite large (2MB!) for such a simple program. Despite all of std being compiled with the moral equivalent of -ffunction-sections, adding -Wl,--gc-sections does very little to slim down the binary:

$ g++ -o hello.so -fPIC -shared driver.cpp sl.a -Wl,--gc-sections
$ ls -l hello.so
-rwxr-xr-x 1 froydnj froydnj 2141544 Apr 26 11:02 hello.so

That's only about 400 bytes eliminated, which seems suboptimal.

The problem is that all of the public functions in libstd.rlib are marked as global symbols. When sl.a is linked into a shared library, all of those global symbols from libstd.rlib are now treated as symbols that the newly-created shared library should export as publically visible symbols. Which creates bloat in terms of a large PLT the shared library must tote around as well as rendering -Wl,--gc-sections ineffective, as virtually everything is transitively reachable from these public functions from libstd.rlib. hello.so has ~5000 visible functions, when it should really only have a handful. hello.so contains code for parsing floating-point numbers, even though it really shouldn't, according to the functions shown above.

This example is admittedly contrived, but Firefox's use of Rust is not terribly dissimilar from this: we compile all the crates we use into rlibs, link all of the rlibs together into a staticlib, and then link the staticlib into our enormous shared library, libxul. We're pretty careful with symbol visibility; we have hundreds of thousands of symbols in libxul, but fewer than 500 exported symbols. We would very much like it if:

  1. libxul didn't suddenly grow thousands of newly-exported symbols overnight.
  2. libxul didn't contain Rust code from std (or otherwise) that it doesn't use.

We didn't think terribly hard about this when we enabled Rust on our desktop platforms (though we should have), but our Android team cares quite a bit about binary size, and Rust support taking up this much space would be a hard blocker on our ability to ship Rust on Android. It would be somewhat less than the above because we'd be compiling for ARM, but it'd still be significant. (I assume the situation is similar on Mac and Windows, though I haven't checked.)

cc @alexcrichton @rillian @glandium

  1. We're pretty careful with symbol visibility; we have hundreds of thousands of symbols in libxul, but fewer than 500 exported symbols.

    I'm curious, how do y'all end up doing this? We have a few options in Rust for what's going on here, but it may not quite overlap with what you're doing:

    1. First up, you can compile with LTO (-Clto) when creating a staticlib. This will internalize as much as possible on the Rust side and LLVM is basically doing --gc-sections at that point. This gives me a 194K shared object.

    2. Next, if you have a whitelist of symbols, you can use a linker script like this:

      {
        global:
          really_hello;
        local: *;
      };
      

      For me that generates a 102K shared library without LTO, and 72K with LTO.

    3. We can tweak the visibility of symbols by default in Rust (similar to https://github.com/rust-lang/rust/issues/32887 I think?), but I at least unfortunately don't know much about all the visibility choices across platforms so I dunno if this'd actually help.

    Those are some ideas off the top of my head at least, but depending on what Gecko is already doing we can likely do something similar :)

    Alex Crichton at 2016-04-26 21:12:56

  2. cc @brson, @rust-lang/tools

    Alex Crichton at 2016-04-26 21:13:31

  3. I'm curious, how do y'all end up doing this?

    We compile with -fvisibility=hidden and wrap all the system headers we use to ensure that visibility is restored to the default visibility for any symbols in system headers. We don't have to wrap symbols on Darwin, though; perhaps the compiler ensures things in system headers have the appropriate visibility?

    On Windows I believe we don't have to do any of this because the default is to have local symbols in the library and you explicitly export what you want. (We wrap STL headers on Windows, but for different reasons.)

    First up, you can compile with LTO (-Clto) when creating a staticlib.

    Ah, that's super-useful all on its own! We should definitely start doing this.

    Next, if you have a whitelist of symbols, you can use a linker script like this...For me that generates a 102K shared library without LTO, and 72K with LTO.

    Is that with -Wl,--gc-sections enabled when linking? Is the LTO you speak of here applied at staticlib generation time, or at link time for the final shared library?

    I don't think a linker script would work for us, because of the messiness of specifying the appropriate symbols (C++ symbol mangling, lovely), but it might be worth investigating.

    We can tweak the visibility of symbols by default in Rust

    That was the possibility I initially thought of, but then I assume you'd have to compile objects separately for shared vs. static libraries or play weird linking tricks.

    The number of symbols that std exports seems rather high (~5K), but I think that has to do with things like exporting functions for all the arithmetic operations on every arithmetic type, and various partial/total-ordering trait operations. I feel like we shouldn't have to do that in general, even if LTO would take care of dead code elimination for us.

    Nathan Froyd at 2016-04-26 22:00:28

  4. (I assume the situation is similar on Mac and Windows, though I haven't checked.)

    This is not really an issue on Windows with msvc due to symbols only being exported from a DLL if they are marked as dllexport or if they are included as part of a .def file (unless nothing is specified via those methods in which case everything is exported). Anything which isn't referenced by those exported symbols is stripped by the linker. While Rust doesn't emit dllexport, it does emit a .def file when it controls creation of the DLL, which will make for very slim DLLs when using the upcoming cdylib crate type.

    Peter Atashian at 2016-04-27 05:54:43

  5. @froydnj

    Awesome, thanks for the info! I was discussing with @brson a bit today about symbol visibility, and our prospects may be bleak in doing something like change by default to hidden visibility (due to backcompat concerns now). In general though it seems that not a lot of thought has gone into the visibility of symbols in Rust beyond "internal or public", and it seems that hidden/protected/default linkage (at least in LLVM terms) is a whole new suite of choices within the "this is a public symbol" option.

    All that basically to say, the compiler probably can't generate hidden symbols today (LLVM certainly can, we just haven't bound it), and it's not the clearest how we'd want to do that just yet. Should be possible in the long term of course though!

    Ah, that's super-useful all on its own! We should definitely start doing this.

    One note about LTO is that it retains all #[no_mangle] non-Rust ABI reachable entry points, so I believe this would mean that the Rust C API would still all go into the PLT for example.

    Is that with -Wl,--gc-sections enabled when linking?

    Yeah, although once you enable Rust LTO the --gc-sections option shouldn't actually do much else (unless you rely on it for stripping C code)

    Is the LTO you speak of here applied at staticlib generation time, or at link time for the final shared library?

    Ah yeah so to clarify, Rust LTO isn't like C LTO where we have a special object file format or something like that and the linker takes care of it. Rather Rust LTO is our way of saying "take the whole world of Rust code, optimize it all together, then emit one object file". We do this by loading LLVM bytecode from rlibs, internalizing all symbols, then throwing it at the LLVM optimizer.

    So to answer your question, this LTO happens at staticlib generation time. The .a archive will just have a smaller object file (as much of it will be stripped), and that object shouldn't necessarily be special in any way.

    That was the possibility I initially thought of, but then I assume you'd have to compile objects separately for shared vs. static libraries or play weird linking tricks.

    Yeah right now we use the same object file for libstd.rlib as well as libstd.so, but these would have very different symbol visibility requirements, so that's problem 1 we'd have to solve :(

    The number of symbols that std exports seems rather high (~5K), but I think that has to do with things like exporting functions for all the arithmetic operations on every arithmetic type, and various partial/total-ordering trait operations.

    Right yeah most of this stuff is just for future monomorphizations. An example of this is that all format strings in rust (e.g. format!("foo: {}", "bar")) will generate a static symbol describing the parsed format string. If this format string is in a generic function, then monomorphizations of the generic function will need to reference the format string, so the symbol is made public (which for us there's only one level of public right now, which as you've found means "in the PLT").

    Basically all that is to say that these symbols are basically just a ton of internal implementation details, and it should be totally fine to hide all of them so long as other Rust code can still link to them. (aka this sounds like exactly hidden visibility)

    Alex Crichton at 2016-04-27 06:32:43

  6. Adding -Clto to our staticlib build in gecko reduced the number of T symbols from 5230, to 280. So this does help a lot.

    However, the library filesize went from 5033882 to 6321450.

    Ralph Giles at 2016-04-27 23:47:19

  7. @rillian you're sure optimizations are turned on, right? (e.g. -O)

    Alex Crichton at 2016-04-27 23:49:26

  8. No, that was an unoptimized build. Trying with -O now.

    Ralph Giles at 2016-04-28 00:00:38

  9. Ok, in an opt build -Clto changes the staticlib file size from 4314802 to 1669346 bytes. Thanks for the hint!

    Ralph Giles at 2016-04-28 00:31:35

  10. See https://bugzilla.mozilla.org/show_bug.cgi?id=1268547 about turning this on for gecko.

    Ralph Giles at 2016-04-28 16:43:15

  11. This might be the same underlying problem as in https://github.com/rust-lang/rust/issues/37530.

    Michael Woerister at 2016-11-28 01:36:21

  12. Some more comments on this thread -- https://internals.rust-lang.org/t/rust-staticlibs-and-optimizing-for-size/5746

    Alex Crichton at 2017-08-27 02:49:41