Move HashMap to liballoc
This is blocked on https://github.com/rust-lang/rust/pull/26870
- [ ] Move FnvHasher from src/librustc/util/nodemap.rs to libcollections
- [ ] Expose FnvHasher as DeterministicState
- [ ] Move HashMap's implementation into libcollections defaulting to DeterministicState
- [ ] Unhardcode HashMap's constructors/methods from using RandomState (instead relying on default fallback)
- [ ] Re-export HashMap in std::collections as
pub type HashMap<K, V, S = RandomState> = core_collections::HashMap<K, V, S> - [ ] Do the same for HashSet
- [ ] Re-export DeterministicState in std as well (so we have a better answer to "HashMap is slow")
- [ ] Document the performance-stability/security tradeoff between DeterministicState and RandomState, and that users of the std facade will get a different default from direct users of libcollections (because RandomState requires OS rng to seed its state)
I am willing to mentor this if someone else wants to do it.
Note that re-exporting HashMap may be the trickiest one; it's not clear to me that it's trivial, and may involve manually re-exporting everything in std::collections::hash_map to get the right behaviour (annoying and potentially fragile).
This is blocked on https://github.com/rust-lang/rust/pull/26870
- [ ] Move FnvHasher from src/librustc/util/nodemap.rs to libcollections
- [ ] Expose FnvHasher as DeterministicState
- [ ] Move HashMap's implementation into libcollections defaulting to DeterministicState
- [ ] Unhardcode HashMap's constructors/methods from using RandomState (instead relying on default fallback)
- [ ] Re-export HashMap in std::collections as
pub type HashMap<K, V, S = RandomState> = core_collections::HashMap<K, V, S> - [ ] Do the same for HashSet
- [ ] Re-export DeterministicState in std as well (so we have a better answer to "HashMap is slow")
- [ ] Document the performance-stability/security tradeoff between DeterministicState and RandomState, and that users of the std facade will get a different default from direct users of libcollections (because RandomState requires OS rng to seed its state)
I am willing to mentor this if someone else wants to do it.
Note that re-exporting HashMap may be the trickiest one; it's not clear to me that it's trivial, and may involve manually re-exporting everything in std::collections::hash_map to get the right behaviour (annoying and potentially fragile).
<!-- TRIAGEBOT_START --> <!-- TRIAGEBOT_ASSIGN_START --> <!-- TRIAGEBOT_ASSIGN_DATA_START$${"user":"clarfonthey"}$$TRIAGEBOT_ASSIGN_DATA_END --> <!-- TRIAGEBOT_ASSIGN_END --> <!-- TRIAGEBOT_END -->rustbot at 2023-05-16 05:10:58
CC @jroesch to confirm that this will work correctly with your design
Aria Desires at 2015-07-23 18:59:54
Yeah this seems to be completely compatible with the current design of defaults. Let me know if there are any issues doing this.
Jared Roesch at 2015-07-23 20:25:04
I would also be fine just not having a default in libcollections and eschew the DeterministicState entirely.
Alex Crichton at 2015-07-23 20:40:11
I would like to have DeterministicState in std because it's painful to explain that if you want a fast HashMap you need to use such-and-such external crate.
Aria Desires at 2015-07-23 21:11:35
Perhaps, yeah, but I also don't mind expanding
std::hashto just have more hashing algorithms. We useRandomStateas a name so we're not committing to an algorithm, but that's basically just because it's the default. If you have to explicitly opt-in to an algorithm it seems fine to just expose the names themselves.Alex Crichton at 2015-07-23 23:15:23
Oh yeah, I'm fine if it's also just straight up FnvState or whatever. Just want a secure and a fast option.
Aria Desires at 2015-07-24 00:09:59
#26870 landed, but you want a snapshot before you can start moving things around.
Eduard-Mihai Burtescu at 2015-07-26 13:15:55
Thanks, snapshot landed just now. I would be happy to do this.
Peter Blackson at 2015-07-28 14:23:07
It sounds like we can't while the feature is gated?
Aria Desires at 2015-07-28 14:34:12
Niko and I talked today and the feature becoming ungated is a decision the core team has to make. We are already going to make a couple tweaks to the algorithm this week, looking forward to this being stable ;)
Jared Roesch at 2015-07-29 01:18:02
@Gankro I'd like to work on this in whatever limited capacity I can, if you're still willing to mentor
Panashe Fundira at 2016-08-22 03:54:49
Looks like I unassigned you? oops
Panashe Fundira at 2016-08-22 03:55:13
@munyari Sadly I don't think progress can be made on this without adopting some strategy for type parameter defaulting.
Eduard-Mihai Burtescu at 2016-08-22 03:57:14
libcollections is gone, so I believe this can be closed. @Gankro if that's wrong please let me know!
Steve Klabnik at 2018-12-10 19:05:19
@steveklabnik all the collections now live in liballoc instead, except for HashMap which is still in std for the same reason it wasn't in libcollections. Please re-open.
jethrogb at 2018-12-11 02:59:41
Since libstd now uses hashbrown, is this still relevant/doable?
Camille Gillot at 2020-04-18 21:50:46
So I didn't explain this well, but historically the main blocker was the the default random hasher relied on thread-locals to allow HashMap instances on the same thread to share a seed (since generating it was expensive, and this was regarded as sufficient to guard against HashDOS).
Thread-local storage is a feature "exclusive" to std, which liballoc (then libcollections) wasn't allowed to use.
That said it looks like things may have changed a lot with how hashbrown does seeding. @Amanieu: is it now trivial for us to move HashMap to libcollections? Or does the std shim still use the thread-localy stuff?
Aria Desires at 2020-04-22 18:35:13
hashbrown itself is
#[no_std]since it uses a non-HashDOS-safe hasher. The std shim is what adds the SipHash hasher which depends on randomness and TLS.Personally, I've just been using hashbrown directly in my
no_stdcode.Amanieu d'Antras at 2020-04-22 18:40:09
This is certainly still relevant and hashbrown hasn't made it more doable. The main reason why this hasn't already been done is the random state for the secure hasher. Thread local storage is only part of the story, the main issue is a that a cryptographically-secure randomness source is needed to initialize the state. This is not available in (the platform-independent) libcore/liballoc. This hasn't changed with hashbrown.
Btw, the tasks in the issue description need updating from the pre-1.0 era.
Yes, you can use hashbrown in
no_std, but library crates that use that in their public APIs are not compatible with thestdHashMap.jethrogb at 2020-04-24 09:49:12
A blocker I saw is that hashbrown itself depends on liballoc. Therefore, making a hashbrown-powered HashMap in liballoc is impossible. A way out would be to push the libstd compatibility layer to hasbrown itself, and make libstd only do reexports and type aliases. I do not know how this interacts with stability annotations.
Camille Gillot at 2020-04-24 10:14:11
We can figure out how to handle the cyclic dependency. That's not a problem.
jethrogb at 2020-04-24 10:23:14
Any way I could help getting this moving ?
GuillaumeDIDIER at 2020-07-10 09:41:51
@GuillaumeDIDIER you can try to convince the powers-that-be that something like extern existensials need to be added to the language.
jethrogb at 2020-07-10 09:48:21
@rustbot claim
Kisaragi at 2023-05-16 05:10:56
an idea I have for moving
HashMaptoalloc: moveRandomStatetoalloctoo and makenewjust call an extern function something like#[panic_handler]whereallochas a default implementation based on a global atomic iif#[cfg(target_has_atomic = "ptr")],stdoverrides that default. programs withoutstdcan supply their own implementation, programs must supply their own implementation if#[cfg(not(target_has_atomic = "ptr"))]Jacob Lifshay at 2023-05-16 09:07:22
That is a breaking change. You can currently use liballoc without providing said implementation.
bjorn3 at 2023-05-16 12:34:06
hmm, well, if
allocalways supplies a default implementation it shouldn't be breaking. the#[cfg(not(target_has_atomic = "ptr"))]version could return a constant or the address of a variable on stack or panic.alternatively, maybe
RandomState::newand all non-generic callers could be marked#[inline]and rustc would be made to only require the extern function to exist ifRandomState::newis transitively used by the user's program.Jacob Lifshay at 2023-05-16 15:52:05
Has any progress been made on this recently? I'd like to help out with this if I can, but want to make sure that I'm not repeating the same work someone's already doing.
Clar Fon at 2023-08-26 05:56:38
Has any progress been made on this recently? I'd like to help out with this if I can, but want to make sure that I'm not repeating the same work someone's already doing.
No; at least, no progress has been made on my side.
However, there are some key-pieces (on zulip thread):
- would we make new HashMap on alloc?: No, we would want both to be same; otherwise it will be not a "solution" to no_std HashMap crates.
- would we want to default hasher with other?: No, it would be a pitfall (the current implementation is defaulting with HashDOS-resistant one)
So, it would be something like:
- Move definition and generic impls into alloc
- Make alias on std, which refers to alloc and defaults with current hasher
- Do something about impl HashMap<K, V, RandomState>
- Create PR and see what happens
Kisaragi at 2023-08-26 23:50:23
In terms of that third point, we already can have incoherent impls in the standard library specifically, so, it should be fine to keep the
impl HashMap<K, V, RandomState>block in libstd while making all of the impls inallocnot referenceRandomStateat all.Clar Fon at 2023-08-27 03:13:35
We can have incoherent inherent impls, not incoherent trait impls.
bjorn3 at 2023-08-27 07:05:02
@rustbot claim
I'll try something soonish to see what I can do.
Clar Fon at 2023-09-02 18:03:47
So, immediate issue detected: right now,
HashMapis implemented by directly including thehashbrowndependency, which itself relies on thealloccrate. This is problematic if we want to include it in thealloccrate itself, since that's a circular dependency.There are honestly two paths forward:
- We move the code for the
hashbrowncrate directly into thealloccrate. This seems like the most reasonable option, since the stated reason for this crate being offered separately at all is to work inno_stdenvironments, and that's the goal of the move. - We create a separate
alloc_internalscrate which contains just the allocator API and its supporting types, and offer the option of making thehashbrowncrate depend on that. I'm not actually sure if this is less intrusive than just moving the code in.
The main consequence of the first option is that this requires effectively moving all development effort for
HashBrowninto the main rust-lang repo, which would require moving over issues and PRs and changing the development process for that. This feels like the proper decision IMHO, but that actual decision should be left to the libs team.The second option does open the doors to move other collection implementations from liballoc into separate crates, but I'm not sure if this is actually something that would be desired.
Either way, I think that this is a sufficient roadblock that would have to be solved by the libs team before work continues on this. I'm willing to help out after a decision is made, but since I personally cannot make that decision, I'll unclaim this until the libs team can have a proper discussion about it. (If any libs team members are watching, nominating this issue for discussion would be appreciated, but I can't actually compel any of you to do anything about it.)
Clar Fon at 2023-09-10 01:15:39
- We move the code for the
imho rather than making a
alloc_internalscrate, that stuff should just be incoresince it's only the interface traits and functions, not actually a global allocator.Jacob Lifshay at 2023-09-10 05:42:09
imho rather than making a
alloc_internalscrate, that stuff should just be incoresince it's only the interface traits and functions, not actually a global allocator.turns out
Allocatoris already incore, what doeshashbrownneed that isn't incore?Jacob Lifshay at 2023-09-10 05:45:41
I took a quick look at hashbrown and here's what it uses
allocfor:- The default type parameter for the
HashMapandHashSetallocators. ToOwnedforHashSet::get_or_insert_owned.handle_alloc_errorfor allocation failures.
This is sufficiently small that it's probably possible to make
hashbrownonly depend on core (with an optional dependency onalloc), though I'm not entirely sure how to deal withhandle_alloc_error. Perhaps we could manually include of copy of the implementation directly inhashbrown? It should be fine to depend on unstable implementation details when building as part of the standard library.Amanieu d'Antras at 2023-09-10 11:33:28
- The default type parameter for the
Is there any actual reason
ToOwnedcan't be defined in core, setting aside the fact that the compiler currently doesn't allow incoherent trait impls?Clar Fon at 2023-09-10 16:09:25
Is there any actual reason
ToOwnedcan't be defined in core, setting aside the fact that the compiler currently doesn't allow incoherent trait impls?I don't think
ToOwnedbelongs outsidealloc. It is quite specific/biased toalloctypes. For example, it considersalloc::string::Stringto be the (one and only) owned version of&str, even though there exist perfectly good alternatives in other crates (e.g. owned string types with small string optimization, or strings that store the length in the allocation, etc.).Mara Bos at 2023-09-21 08:00:45
I would love it if we could find a lang way to allow splitting
Cloneintopub trait CloneFrom<Other: ?Sized> { fn clone_from(&mut self, other: &Other); } pub trait Clone : CloneFrom<Self> where Self: Sized { fn clone(&self) -> Self; }so that that
CloneFromcan be incorefor all the "update my arraystr from that str" cases, and thus reduce some of theToOwnedneeds...scottmcm at 2023-09-21 18:40:24
I don't think
ToOwnedbelongs outsidealloc. It is quite specific/biased toalloctypes. For example, it considersalloc::string::Stringto be the (one and only) owned version of&str, even though there exist perfectly good alternatives in other crates (e.g. owned string types with small string optimization, or strings that store the length in the allocation, etc.).So, my reasoning for this is that while most of the relevant types for this do exist in the alloc crate, there's nothing that explicitly requires it other than the fact that it would be incoherent to defer implementations to the alloc crate.
There currently isn't a supported way to do this besides (maybe) the proposed rust-lang/rfcs#3482, but it feels kind of arbitrary to require that something depend on allocations to itself use the trait. The way it effectively would work is a negative impl in
corethat is "replaced" with a positive impl inalloc, although again, this is absolutely nothing that is currently supported.(This example is probably way beyond the scope of this issue, although I figured I might as well explain what's going on in my head at the moment.)
Clar Fon at 2023-09-21 19:36:13
I would love it if we could find a lang way to allow splitting
Cloneintopub trait CloneFrom<Other: ?Sized> { fn clone_from(&mut self, other: &Other); } pub trait Clone : CloneFrom<Self> where Self: Sized { fn clone(&self) -> Self; }It would be great. This would also eliminate the need for
A: CloneinCloneFromfor collections including Hashbrown.JustForFun88 at 2023-09-26 20:30:03
https://github.com/rust-lang/rust/pull/116113 has a different approach for ?Sized Clone.
David Tolnay at 2023-09-26 21:02:23
Is this true that after this change
std::collections::HashMapwill be vulnerable to that DoS?Askar Safin at 2023-11-05 17:33:23
The only difference is HashMap can be referred from
#![no_std]environment. It will not changestd::collections::HashMap's default hasher (RandomState, which is HashDoS-resistant)Kisaragi at 2023-11-05 17:41:47
@KisaragiEffective, cool!
So,
std::collections::HashMapandalloc::collections::HashMapwill behave slightly differently? This will be very hard-to-discover, so this should be properly documented on both rustdoc pages.Maybe we should call
alloc::collections::HashMapdifferently to make sure nobody will assume it is DoS-resistant?Askar Safin at 2023-11-05 18:48:50
As I understand
alloc::collections::HashMapwill require explicitly passing a hasher (egalloc::collections::HashMap::<u32, String, MyHasher>is a valid type, butalloc::collections::HashMap::<u32, String>is not due to missing the hasher), whilestd::collections::HashMapwill default toRandomStateif you don't explicitly pass a hasher. And unless you depend on libstd,HashMap::new()and other methods which assumeRandomStateto be the used hasher will be unavailable.bjorn3 at 2023-11-05 19:30:22
Poking back in here just to reprocess the current requirements on implementing this.
My understanding is that right now, the issue is that the
hashbrowncrate internally uses liballoc for some of its methods in order to work, and thus can't be added as a dependency for the liballoc crate.One of the biggest concerns was the fact that it references
ToOwned, which we don't really want to add into thecorecrate.Would it be possible to, perhaps, add
ToOwnedunder a perma-unstable feature flag tocore, alongside marking it for incoherent impls so it can have all its implementations inalloc? That way,hashbrownshould be able to access it without us changing the public API.(Also, if anyone with permissions to do so would like to update the issue description, that would be appreciated, since right now it is very outdated.)
Clar Fon at 2024-07-29 13:53:21
I would love it if we could find a lang way to allow splitting
Cloneintopub trait CloneFrom<Other: ?Sized> { fn clone_from(&mut self, other: &Other); } pub trait Clone : CloneFrom<Self> where Self: Sized { fn clone(&self) -> Self; }so that that
CloneFromcan be incorefor all the "update my arraystr from that str" cases, and thus reduce some of theToOwnedneeds...There's something similar in #126799
GrigorenkoPV at 2024-07-30 21:41:45
To make my former request even easier, here's a potential issue description you could copy-paste in to make it more up to date:
<details><summary>Rendered version</summary>Currently, `HashMap` and `HashSet` are only available in `std` because they depend on `RandomState`, even though this is only a *soft* dependency. These types could be moved to `alloc` as long as the user provides an implementation of `BuildHasher` themself. Right now, this is difficult because the implementation for `HashMap` and `HashSet` is in the external `hashbrown` crate, which depends on the `alloc` crate, and thus can't be used as a dependency for `alloc`. So, there are effectively two steps here: - [ ] Make `hashbrown` crate able to work without depending on the `alloc` crate. - [ ] Move code that doesn't depend on `RandomState` into `alloc` crate so we can have these types. - [ ] Convert the `std` types into aliases which default the `BuildHasher` type to `RandomState`, and implement the various methods for this default on the alias instead of its own type. ### Unresolved Questions * Should `hashbrown` still exist as a standalone crate once this move is complete? Its primary purpose at the moment is to have `no_std` support, so, we could potentially solve a lot of these problems by just moving its code directly into `alloc`. However, its `RawTable` API would have to be made available in these cases, and it's unclear how long that would take to stabilise. ### Implementation history *None currently.*Currently,
HashMapandHashSetare only available instdbecause they depend onRandomState, even though this is only a soft dependency. These types could be moved toallocas long as the user provides an implementation ofBuildHasherthemself.Right now, this is difficult because the implementation for
HashMapandHashSetis in the externalhashbrowncrate, which depends on thealloccrate, and thus can't be used as a dependency foralloc.So, there are effectively two steps here:
- [ ] Make
hashbrowncrate able to work without depending on thealloccrate. - [ ] Move code that doesn't depend on
RandomStateintoalloccrate so we can have these types. - [ ] Convert the
stdtypes into aliases which default theBuildHashertype toRandomState, and implement the various methods for this default on the alias instead of its own type.
Unresolved Questions
- Should
hashbrownstill exist as a standalone crate once this move is complete? Its primary purpose at the moment is to haveno_stdsupport, so, we could potentially solve a lot of these problems by just moving its code directly intoalloc. However, itsRawTableAPI would have to be made available in these cases, and it's unclear how long that would take to stabilise.
Implementation history
None currently.
</details>Clar Fon at 2024-08-03 19:30:16
- [ ] Make
Current progress/conclusion on figuring out how to make this work: I'm, very close to just copying the
hashbrowncode directly into liballoc. This would effectively solve all the current issues, but it would mean that we'd have to find another way to expose theRawTableAPIs that crates currently depend on. So, here's my thought process:- We make
hashbrownjust contain these raw table APIs which are currently unstable. Or at least, make its other stuff under a feature flag which does not requirealloc. (Not requiringalloccan be a separate feature flag only used by the standard library, since it will require unstablecore::allocstuff.) - We use those parts as the dependency in
alloc, and then implement the properHashMapandHashSetusing those types. - The raw table APIs can continue to evolve and iterate and eventually find their way into alloc, but this doesn't necessarily block
allocfrom incorporating the parts that we consider stable. (theHashMap/HashSetAPIs themselves)
This feels like the best conclusion since it avoids what I assume are some of the pitfalls to just moving everything into
alloc:- People still want the
RawTableAPIs, but they're not stable enough for the standard library to use. - People currently depending on the
no_stdversions can still use them stably while theallocversion is unstable.
This feels like the right approach to me, but please feel free to provide your feedback/concerns before I go all in on it, since it'll probably be a bit.
Clar Fon at 2024-08-03 19:57:16
- We make
Correct me if I'm wrong, but another reason to use the hashbrown crate right now is the Equivalent trait, which allows you to use some types for lookup that you can't with the standard library API. What will happen to that trait and the related methods under the plan you outlined?
Juhan Oskar Hennoste at 2024-08-03 20:29:10
Could we handle hashbrown the same way as backtrace-rs? Keep it available as external crate, but at the same time use
#[path = "..."] mod foo;to include it in the standard library.bjorn3 at 2024-08-03 20:38:00
Could we handle hashbrown the same way as backtrace-rs? Keep it available as external crate, but at the same time use
#[path = "..."] mod foo;to include it in the standard library.I had no idea we did this in the standard library, or that it works as you'd expect. That feels promising, actually…
Correct me if I'm wrong, but another reason to use the hashbrown crate right now is the Equivalent trait, which allows you to use some types for lookup that you can't with the standard library API. What will happen to that trait and the related methods under the plan you outlined?
I think that the best path forward would be to try and get this incorporated into the standard library, rather than rely on it existing in
hashbrown. While the raw table API is relatively complicated and thus won't be easy to stabilise, this isn't.
EDIT: Right now,
hashbrownjust refers to theequivalentcrate, which currently has 8 dependents on crates.io. If people are using this trait and have compelling examples, I would recommend they file the ACP for that on the standard library, likely moving the trait tocore::cmp::Equivalent. I was thinking of just writing up the ACP for this but genuinely cannot justify it myself, since as far as the publicly available data is concerned, nobody has ever had this problem.EDIT 2: There appears to be an issue open on the
equivalentcrate repo for moving this to libstd, so, I would encourage you put your discussion there first: indexmap-rs/equivalent#3Clar Fon at 2024-08-03 21:15:56