Reconsider using SVH for loading transitive crate dependencies

2bee4f7
Opened by Alex Crichton at 2022-04-22 14:49:50

When you write extern crate bar in your crate then the compiler will search for and load all crates that are transitive dependencies of bar. Let's say, for example, that bar depended on crate foo. The compiler will look for libfoo.rlib after loading libbar.rlib. Because bar is already compiled the compiler wants to find the precise same libfoo.rlib as before. Each crate has a "strict version hash" historically (SVH) which is used for this. The metadata of bar says that it needs crate foo at a precise SVH. The intention here is to prune duplicates or other crates that look like they could satifsy the request.

Over time, though, the SVH's definition has changed over time and it may no longer be the correct tool for the job here. @nikomatsakis may have more information.

  1. cc @michaelwoerister

    Niko Matsakis at 2017-06-15 14:17:33

  2. So we talked this over once before. My impression at the time was that we were going to change the model so that we had two inputs:

    • the crate-name
    • some metadata (e.g. a crates.io version number, but also a bit of other stuff)

    and that the combination of these two would uniquely identify the crate within the universe of crates. Hence, if we find some ambiguity, that implies that the user combined files that they ought not to (i.e., combined two universes of crates).

    The SVH, meanwhile, was just going to be used for verification purposes or by the incremental code if it wanted a crude idea of "did something change". (i.e., we could use the SVH to check that the crate with a given name/disambiguator has not changed, if we wanted to).

    One benefit of this model:

    • Imagine that crate A depends on crate B, and you've already compiled both of them.
    • Then crate B changes (hence its SVH changes). Crate A is re-compiled, but we see that nothing depends on crate B.
    • Under this current model where the SVH is part of crate search, crate A must be regenerated to some extent -- at least the SVH it embeds for Crate B must be updated.
    • Under the model I was proposing, the artifact for crate A could theoretically be left completely unchanged.

    Niko Matsakis at 2017-06-15 14:21:46

  3. Gah, I can't find the previous discussion. I guess it was more symbol naming than the SVH per se?

    Niko Matsakis at 2017-06-15 14:23:53

  4. There's been some symbol-naming/SVH discussion in the RFC repo somewhere.

    Michael Woerister at 2017-06-15 14:24:36

  5. The SVH includes the crate name and the -C metadata arguments though, right? In that sense it seems to me that the SVH is still appropriate for use here?

    I think it's fine to say that "name + -Cmetadata" must uniquely identify. The bugfix to Cargo was a case where this wasn't true because two distinct crates has the same name/metadata.

    Alex Crichton at 2017-06-15 14:26:58

  6. (morally I agree though that name+metadata should be all we use to load/match crates with)

    Alex Crichton at 2017-06-15 14:27:17

  7. The SVH includes the crate name and the -C metadata arguments though, right? In that sense it seems to me that the SVH is still appropriate for use here?

    Well, it's not wrong to use, but it's stronger than the key that we truly want, and hence it means that we might overlook cases where there would be an ambiguity because it winds up being resolved by this extra info that we are not supposed to be paying attention to.

    Niko Matsakis at 2017-06-15 18:18:44

  8. In other words, it might mask the fact that the "metadata" is not sufficiently unique.

    Niko Matsakis at 2017-06-15 18:19:00

  9. Ok makes sense to me! Sounds like we should change the usage of SVH in crate loading to a hash of metadata+crate name?

    Alex Crichton at 2017-06-15 18:21:59

  10. That would be my preference, yeah. Would that require further changes to cargo?

    Niko Matsakis at 2017-06-15 18:44:54

  11. Nah should be able to work as-is!

    Alex Crichton at 2017-06-15 18:47:55

  12. Discussed at T-compiler backlog bonanza.

    It seems like there is still potential interest in investing effort here, though the main people who should own it may not have time to work on it right now.

    The main thing is that its not a C-tracking-issue though, in our opinion.

    @rustbot label: -C-tracking-issue

    Felix S Klock II at 2022-04-22 14:49:48