export_name with unusual utf8 breaks new version script based linker

c6c54d3
Opened by m4b at 2024-09-21 08:28:53

Unfortunately it looks like the awesome changes in #38117 caused breakage while linking when a weird export name is used (probably due to the version script requiring ascii, or some other esoterica):

        #[export_name="bad_∢"]
        pub extern fn bad(i: usize) {}

NOTE haven't checked this particular version, but certain combinations of values cause linker error complaining about invalid chars in version script.

  1. I guess, we should just restrict export_name to ASCII.

    cc @rust-lang/compiler

    Michael Woerister at 2016-12-08 23:06:55

  2. So this is an artificial restriction, for no technical merit. I'd really like utf8 support for symbol names, just on principle.

    Nevertheless it doesn't work as is now (and used to) (albeit entirely because of a broken linker toolchain), so ASCII might be required... Iirc, swift re-encodes utf symbols in their name mangler, but that's not the same as real utf8 symbol names in the binary, which is just literally the coolest. I don't know of another language that allows that and can run (and the dynamic linker doesn't have any problem with it, because it just sees null terminated bytes, which utf8 preserves.)

    m4b at 2016-12-08 23:22:00

  3. With which linkers do you run into the problem?

    Michael Woerister at 2016-12-12 18:10:58

  4. Both gold and ld (-fuse-ld=bfd) are broken for me:

    #[export_name="󠆷∀🢫"]
    #[no_mangle]
    pub extern fn whatever() {
        println!("nothing");
    }
    

    I guess unicode is too hard for C ppls.

    rustc --crate-type=cdylib src/lib.rs 
    error: linking with `cc` failed: exit code: 1
      |
      = note: "cc" "-Wl,--as-needed" "-Wl,-z,noexecstack" "-m64" "-L" "/home/m4b/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/x86_64-unknown-linux-gnu/lib" "lib.0.o" "-o" "liblib.so" "-Wl,--version-script=/tmp/rustc.AxbnbIEKM7Z3/list" "-Wl,--gc-sections" "-nodefaultlibs" "-L" "/home/m4b/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/x86_64-unknown-linux-gnu/lib" "-Wl,-Bstatic" "-Wl,-Bdynamic" "/home/m4b/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/x86_64-unknown-linux-gnu/lib/libstd-b4054fae3db32020.rlib" "/home/m4b/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/x86_64-unknown-linux-gnu/lib/librand-1c6ed188684e7d33.rlib" "/home/m4b/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/x86_64-unknown-linux-gnu/lib/libcollections-63f7707126c5a809.rlib" "/home/m4b/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/x86_64-unknown-linux-gnu/lib/libstd_unicode-a9711770523833d4.rlib" "/home/m4b/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/x86_64-unknown-linux-gnu/lib/libpanic_unwind-d2ecc8049920bea8.rlib" "/home/m4b/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/x86_64-unknown-linux-gnu/lib/libunwind-5837d7d3490e00c5.rlib" "/home/m4b/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/x86_64-unknown-linux-gnu/lib/liballoc-0720511b45a7223a.rlib" "/home/m4b/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/x86_64-unknown-linux-gnu/lib/liballoc_system-34e7f110f175a258.rlib" "/home/m4b/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/x86_64-unknown-linux-gnu/lib/liblibc-ab203041f1ec5313.rlib" "/home/m4b/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/x86_64-unknown-linux-gnu/lib/libcore-93f19628b61beb76.rlib" "/home/m4b/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/x86_64-unknown-linux-gnu/lib/libcompiler_builtins-35d2bc471c7ce467.rlib" "-l" "dl" "-l" "rt" "-l" "pthread" "-l" "gcc_s" "-l" "pthread" "-l" "c" "-l" "m" "-l" "rt" "-l" "util" "-shared"
      = note: /usr/bin/ld:/tmp/rustc.AxbnbIEKM7Z3/list:3: ignoring invalid character `\363' in script
    /usr/bin/ld:/tmp/rustc.AxbnbIEKM7Z3/list:3: ignoring invalid character `\240' in script
    /usr/bin/ld:/tmp/rustc.AxbnbIEKM7Z3/list:3: ignoring invalid character `\206' in script
    /usr/bin/ld:/tmp/rustc.AxbnbIEKM7Z3/list:3: ignoring invalid character `\267' in script
    /usr/bin/ld:/tmp/rustc.AxbnbIEKM7Z3/list:3: ignoring invalid character `\342' in script
    /usr/bin/ld:/tmp/rustc.AxbnbIEKM7Z3/list:3: ignoring invalid character `\210' in script
    /usr/bin/ld:/tmp/rustc.AxbnbIEKM7Z3/list:3: ignoring invalid character `\200' in script
    /usr/bin/ld:/tmp/rustc.AxbnbIEKM7Z3/list:3: ignoring invalid character `\360' in script
    /usr/bin/ld:/tmp/rustc.AxbnbIEKM7Z3/list:3: ignoring invalid character `\237' in script
    /usr/bin/ld:/tmp/rustc.AxbnbIEKM7Z3/list:3: ignoring invalid character `\242' in script
    /usr/bin/ld:/tmp/rustc.AxbnbIEKM7Z3/list:3: ignoring invalid character `\253' in script
    /usr/bin/ld:/tmp/rustc.AxbnbIEKM7Z3/list:3: syntax error in VERSION script
    collect2: error: ld returned 1 exit status
    
    rustc --crate-type=cdylib -C link-args="-fuse-ld=gold" src/lib.rs 
    error: linking with `cc` failed: exit code: 1
      |
      = note: "cc" "-Wl,--as-needed" "-Wl,-z,noexecstack" "-m64" "-L" "/home/m4b/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/x86_64-unknown-linux-gnu/lib" "lib.0.o" "-o" "liblib.so" "-Wl,--version-script=/tmp/rustc.5LjCdyxrgOsb/list" "-Wl,--gc-sections" "-nodefaultlibs" "-L" "/home/m4b/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/x86_64-unknown-linux-gnu/lib" "-Wl,-Bstatic" "-Wl,-Bdynamic" "/home/m4b/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/x86_64-unknown-linux-gnu/lib/libstd-b4054fae3db32020.rlib" "/home/m4b/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/x86_64-unknown-linux-gnu/lib/librand-1c6ed188684e7d33.rlib" "/home/m4b/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/x86_64-unknown-linux-gnu/lib/libcollections-63f7707126c5a809.rlib" "/home/m4b/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/x86_64-unknown-linux-gnu/lib/libstd_unicode-a9711770523833d4.rlib" "/home/m4b/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/x86_64-unknown-linux-gnu/lib/libpanic_unwind-d2ecc8049920bea8.rlib" "/home/m4b/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/x86_64-unknown-linux-gnu/lib/libunwind-5837d7d3490e00c5.rlib" "/home/m4b/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/x86_64-unknown-linux-gnu/lib/liballoc-0720511b45a7223a.rlib" "/home/m4b/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/x86_64-unknown-linux-gnu/lib/liballoc_system-34e7f110f175a258.rlib" "/home/m4b/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/x86_64-unknown-linux-gnu/lib/liblibc-ab203041f1ec5313.rlib" "/home/m4b/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/x86_64-unknown-linux-gnu/lib/libcore-93f19628b61beb76.rlib" "/home/m4b/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/x86_64-unknown-linux-gnu/lib/libcompiler_builtins-35d2bc471c7ce467.rlib" "-l" "dl" "-l" "rt" "-l" "pthread" "-l" "gcc_s" "-l" "pthread" "-l" "c" "-l" "m" "-l" "rt" "-l" "util" "-shared" "-fuse-ld=gold"
      = note: /usr/bin/ld.gold: error: /tmp/rustc.5LjCdyxrgOsb/list:3:5: invalid character
    /usr/bin/ld.gold: error: /tmp/rustc.5LjCdyxrgOsb/list:3:5: syntax error, unexpected $end, expecting STRING or QUOTED_STRING or EXTERN
    /usr/bin/ld.gold: fatal error: unable to parse version script file /tmp/rustc.5LjCdyxrgOsb/list
    collect2: error: ld returned 1 exit status
    

    m4b at 2016-12-21 06:38:24

  5. I think we should report this as a bug against GNU LD, unless there is a way to quote symbol names.

    Demi Marie Obenour at 2017-03-01 22:03:12

  6. Triage: not aware of any changes

    Steve Klabnik at 2018-11-02 16:39:39

  7. Worth noting, this is not restricted to Unicode. Any non-alphanumeric characters like : or break it too. This is a problem for us as our architecture involves WASM functions named canister_query foo or canister_update foo. When developing on Linux, you can cargo build --target wasm32-unknown-unknown without errors, but cargo build produces a giant linker error; most commonly comes up when swapping cargo check with cargo rustc.

    Adam Spofford at 2023-05-12 20:24:30

  8. Why are you using spaces in the symbol names instead of something like canister_query__foo or _ZN14canister_query3foo? Spaces are not portable across all targets even ignoring version scripts afaik. Symbol mangling is done by C++ and Rust because a lot of characters are not portable.

    bjorn3 at 2023-05-12 20:42:25

  9. Because then there is absolutely no ambiguity about what's an intended export of the wasm module vs random compiler (or user-written!) garbage, and because you can easily write it or interpret it by hand, and because we don't have to be portable when the only intended target is wasm32 and wasm32 supports it. It works on Mac, it works on Linux with rustc 1.2.0, but it doesn't work on Linux with current rustc.

    Adam Spofford at 2023-05-12 21:15:52

  10. If you want to avoid ambiguity just add a random string to it. I did expect spaces to not work with GCC and when using an external assembler with LLVM (as rustc used to do for some targets due to LLVM missing an internal assembler for them)

    bjorn3 at 2023-05-12 21:26:17

  11. This needs to be fixed in linkers.

    Demi Marie Obenour at 2023-05-13 12:17:34

  12. It might worth a separate lint at the same time... I don't believe linker behavior will be tweaked at the near future.

    Charles Lew at 2024-09-21 01:44:19

  13. Why is a linker script needed?

    Demi Marie Obenour at 2024-09-21 08:24:59

  14. One can have a C function with a non-ASCII name, FYI.

    Demi Marie Obenour at 2024-09-21 08:25:24

  15. We are using a version script, not a linker script. This to tell the linker which symbols to export from the dylib/cdylib.

    bjorn3 at 2024-09-21 08:28:53