Lint for undesirable, implicit copies
As part of https://github.com/rust-lang/rust/issues/44619, one topic that keeps coming up is that we have to find some way to mitigate the risk of large, implicit copies. Indeed, this risk exists today even without any changes to the language:
let x = [0; 1024 * 10];
let y = x; // maybe you meant to copy 10K words, but maybe you didn't.
In addition to performance hazards, implicit copies can have surprising semantics. For example, there are several iterator types that would implement Copy, but we were afraid that people would be surprised. Another, clearer example is a type like Cell<i32>, which could certainly be copy, but for this interaction:
let x = Cell::new(22);
let y = x;
x.set(23);
println!("{}", y.get()); // prints 22
For a time before 1.0, we briefly considered introducing a new Pod trait that acted like Copy (memcpy is safe) but without the implicit copy semantics. At the time, @eddyb argued persuasively against this, basically saying (and rightly so) that this is more of a linting concern than anything else -- the implicit copies in the example above, after all, don't lead to any sort of unsoundness, they just may not be the semantics you expected.
Since then, a number of use cases have arisen where having some kind of warning against implicit, unexpected copies would be useful:
- Iterators implementing
Copy - Copy/clone closures (closures that are copy can be surprising just as iterators can be)
Cell,RefCell, and other types with interior mutability implementingCopy- Behavior of
#[derive(PartiallEq)]and friends with packed structs - A coercion from
&TtoT(i.e., it'd be nice to be able to dofoo(x)wherefoo: fn(u32)andx: &u32)
I will writeup a more specific proposal in the thread below.
As part of https://github.com/rust-lang/rust/issues/44619, one topic that keeps coming up is that we have to find some way to mitigate the risk of large, implicit copies. Indeed, this risk exists today even without any changes to the language:
let x = [0; 1024 * 10]; let y = x; // maybe you meant to copy 10K words, but maybe you didn't.In addition to performance hazards, implicit copies can have surprising semantics. For example, there are several iterator types that would implement
Copy, but we were afraid that people would be surprised. Another, clearer example is a type likeCell<i32>, which could certainly be copy, but for this interaction:let x = Cell::new(22); let y = x; x.set(23); println!("{}", y.get()); // prints 22For a time before 1.0, we briefly considered introducing a new
Podtrait that acted likeCopy(memcpy is safe) but without the implicit copy semantics. At the time, @eddyb argued persuasively against this, basically saying (and rightly so) that this is more of a linting concern than anything else -- the implicit copies in the example above, after all, don't lead to any sort of unsoundness, they just may not be the semantics you expected.Since then, a number of use cases have arisen where having some kind of warning against implicit, unexpected copies would be useful:
- Iterators implementing
Copy - Copy/clone closures (closures that are copy can be surprising just as iterators can be)
Cell, ~~RefCell, and other types with interior mutability implementingCopy~~- Behavior of
#[derive(PartiallEq)]and friends with packed structs - A coercion from
&TtoT(i.e., it'd be nice to be able to dofoo(x)wherefoo: fn(u32)andx: &u32)
I will writeup a more specific proposal in the thread below.
Ralf Jung at 2022-03-06 16:27:01
- Iterators implementing
I think the lint would work roughly like this. We define a set of types that should not be implicitly copied. This would include any type whose size is >N bytes (where N is some arbitrary number, perhaps tunable via an attribute on the crate), but also any struct that is tagged with
#[must_clone]. The name of#[must_clone]is probably bad; it is meant to be reminiscent of#[must_use]-- basically it's saying that, to copy this type, one ought to clone it (e.g.,let y = x.clone()rather thanlet y = x), so that the semantics are clear to the reader.We would then comb code for implicit copies. This is (somewhat) easy to define at the MIR level: any use of a user-defined variable (not a temporary) that is of a type implementing
Copy. Here are some examples where this would trigger. Let's assumexis a variable of "must clone" type:let y = x; foo(x); let y = [x; 22]; // at least I think this would count =)One interesting case that would, by this definition, be linted against is this:
let z = move || use(x); // copies `x` into the closure bodyThis one is interesting because, unlike the others, there is no way to readily silence the lint, except by adding an
#[allow(implicit_copy)]. That is usually a bad sign -- we may not want to lint on this case. One can argue, certainly, that themoveis sufficient to warn the reader. Not sure if I buy that. =)It is also interesting that the use cases here perhaps breakdown into two categories:
- Big things that may not want to be copied at all.
- Things where it's ok to copy as a last-use (e.g.,
Cell<i32>, iterators, closures).
In fact, I would argue that the latter is the majority case, so maybe we just want to separate this into two lints, or else say that structs with the
#[must_clone]tag only warn if -- after the implicit copy -- there is a subsequent use. This would actually solve themove || use(x)problem as well, since one could rewrite the closure expression to:let z = { let x0 = x.clone(); move || use(x0) // this is a last-use of `x0`, hence no warning }; use(x); // uses original `x`Niko Matsakis at 2017-11-01 13:39:51
I would prefer it if this was more specific, e.g.
forloops warning if the iteratee is used afterwards.Eduard-Mihai Burtescu at 2017-11-01 21:28:32
@nikomatsakis
You don't want to warn on any use - that would warn on moves too. You want to warn on uses where the variable is live after the use.
Ariel Ben-Yehuda at 2017-11-02 12:53:07
@arielb1 indeed.
Niko Matsakis at 2017-11-08 18:20:27
@eddyb do you have a specific example of something you would want to warn, and something you would not want?
Niko Matsakis at 2017-11-08 18:21:13
@nikomatsakis Copying a
Rangearound shouldn't warn (just like any other pair). Using aCopyvariable implementingIntoIterator/Iterator(e.g. aRange) after iterating on it with aforloop should warn, as previously discussed.Eduard-Mihai Burtescu at 2017-11-08 20:03:01
@eddyb it seems like we could generalize then to a lint that works like this:
- Types can be marked as
#[must_clone]:- Examples:
Cell - Warning arises if some
PlaceX (of such a type) is copied and then used again later.
- Examples:
- Methods can be marked as
#[must_clone]:- Example:
IntoIterator::into_iter - Warning arises if the method is called on some
PlaceX and then X is used again later.
- Example:
Maybe a better name than
#[must_clone], especially for the second case.Niko Matsakis at 2018-01-23 21:55:09
- Types can be marked as
Anyway, regardless of the nitty gritty details, let me leave a few pointers here for how we might implement this. @cramertj is hopefully gonna take a look.
First off, let's look a bit at MIR. MIR definition is here.. A crucial concept I referenced above is a
Place, which represents a path:https://github.com/rust-lang/rust/blob/4e3901d35f6a8652f67111e7272263c9e62ab3e1/src/librustc/mir/mod.rs#L1231-L1243
The arguments to MIR statements are always operands, which explicitly identify a copy:
https://github.com/rust-lang/rust/blob/4e3901d35f6a8652f67111e7272263c9e62ab3e1/src/librustc/mir/mod.rs#L1399-L1403
I think right now we will produce a
Copyoperand whenever the input is of copy type, and aMoveoperand whenever the input may not be ofCopytype. It is an error to use a value after it is moved. So our analysis would look forCopyoperands.One way to do that is with the MIR visitor, which lets you quickly walk the IR. In particular we'd probably want to override the
visit_operandmethod:https://github.com/rust-lang/rust/blob/4e3901d35f6a8652f67111e7272263c9e62ab3e1/src/librustc/mir/visit.rs#L142-L146
There is code for computing liveness -- basically, whether a variable may be used again. That code lives in
liveness.rs. You can useliveness_of_localsto compute what is live and where:https://github.com/rust-lang/rust/blob/4e3901d35f6a8652f67111e7272263c9e62ab3e1/src/librustc_mir/util/liveness.rs#L117-L120
but that only gives you values at basic block boundaries. You will want
simulate_blockto walk through and find out what is live at any particular point:https://github.com/rust-lang/rust/blob/4e3901d35f6a8652f67111e7272263c9e62ab3e1/src/librustc_mir/util/liveness.rs#L161-L167
So I guess that roughly the idae would be to identify copies where the value's type is marked to be linted. We would then compute the liveness at the point of copy and see if the
Placebeing moved may be referenced again.Some annoying challenges:
- liveness at present is computed only for local variables;
- so if somebody moves from
a.band then usesa.c, we can't really tell that - the move related code from dataflow is better here I guess, but I don't think we currently have an analysis for all uses, only for moves
- still, woulnd't be hard to adapt, right? Maybe there's a reason it would be? I'm not sure why we didn't use dataflow in the first place here.
- so if somebody moves from
Still, this is a lint after all, it doesn't have to be totally precise. I think that for the first version of the code I would only worry about local variables and ignore more complex
Placevalues. That seems like a simple thing to fix up later.Niko Matsakis at 2018-01-23 22:50:59
- liveness at present is computed only for local variables;
@qmx may take a look at this, since @cramertj never had time
Niko Matsakis at 2018-02-16 00:35:10
@qmx I started https://github.com/cramertj/rust/tree/copy-lint if you want to take a look. Not sure how useful it is-- I only got the skeleton.
Taylor Cramer at 2018-02-16 00:52:57
RefCellcannot beCopy. The problem is that if you have&RefCell<T>, you could end up copying it while another thread has a mutable reference to the content of theRefCell. That'd be unsound.Same for
Mutex,RwLock.EDIT: As @rkruppe pointed out, "thread" doesn't make sense here. That doesn't change the problem though, see below.
Ralf Jung at 2018-03-28 16:28:02
RefCell is not Sync so how could you have a
&RefCell<T>in one thread while another thread can access the referent?Hanna Kruppe at 2018-03-28 16:31:36
This can all happen in one thread.
Ralf Jung at 2018-03-28 19:44:21
let r = RefCell::new(42); let r1 = &r; let r2 = &r; let _mutref = r1.borrow_mut(); let illegal_copy = *r2; // copying while there is an outstanding mutable referenceRalf Jung at 2018-03-28 19:45:51
Oh yeah you're right. The mention of another thread really mislead me there.
Hanna Kruppe at 2018-03-28 19:46:42
Sorry for that, not sure what I thought. "Another function" is really what I meant (though it can be the same, as in my counterexample).
Ralf Jung at 2018-03-28 19:47:13
@RalfJung in retrospect, the RefCell problem is obvious -- basically, of course copying is bad, for the same reason that
borrow()isn't infallible. Silly me.Niko Matsakis at 2018-04-03 20:59:49
Just found out about https://github.com/rust-lang/rfcs/issues/2407 - that would be a nice suggestion for the linter in the future, wouldn't it?
Doug Campos at 2018-04-20 20:20:55
Just as a data point, this issue affects real world programs. Recently rav1e has eliminated an implicit copy of a large type and gained speed thanks to it.
Changing &table to table in this statement (and the similar statement in ac_q) eliminated an implicit copy onto the stack, reducing the encoding time by 2.5%.
est31 at 2021-02-27 18:22:33
Just wanted to note that we did add a lint that highlights all moves larger than a configured limit https://github.com/rust-lang/rust/pull/83519
Felix S Klock II at 2021-10-26 15:35:05