Command-line arguments are cloned a lot on Unix
The std::sys::unix::args module does a lot of allocation and cloning of command-line parameters:
- On startup,
std::sys::unix::args::initcopies all of the command-line arguments into aBox<Vec<Vec<u8>>>(except on macOS and iOS). - When
std::env::argsorargs_osis called, it eagerly copies all of the args into a newVec<OsString>.
On non-Apple systems, this means there is at least one allocation and clone per argument (plus 2 additional allocations, for the outer Vec and Box) even if they are never accessed. These extra allocations take up space on the heap for the duration of the program.
On both Apple and non-Apple systems, accessing any args causes at least one additional allocation and clone of every arg. Calling std::env::args more than once causes all arguments to be cloned again, even if the caller doesn't iterate through all of them.
On Windows, for comparison, each arg is cloned lazily only when it is yielded from the iterator, so there are zero allocations or clones for args that are never accessed (update: at least, no clones in Rust code; see comments below).
Incidentally, I was just reading http://andrewkelley.me/post/zig-december-2017-in-review.html which says
It's really a shame that Windows command line parsing requires you to allocate memory. This means that to have a cross-platform API for command line arguments, even though in POSIX it can never fail, we have to handle the possibility because of Windows.
and i was wondering about our code regarding this.
Steve Klabnik at 2018-01-03 19:05:15
Some ideas on reducing each source of allocation/cloning (on startup, and on iterator construction):
-
Copying on startup could be avoided by storing the
argcandargvvalues thatinitreceives from the OS, instead of cloning their contents. This could change behavior for programs that use unsafe platform-specific code to access these values directly and mutate them, then later callstd::env::args. But such programs already behave inconsistently between different platforms (e.g. macOS versus Linux). -
Eager copying when constructing the
Argsiterator could be replaced by lazy cloning during iteration, as we already do on Windows. This requires that the data it clones from is guaranteed to last for the duration of the iterator (which has a'statictype, so it could be up to the duration of the program). This should be safe for data that is created ininitand destroyed incleanupas in the current non-Apple Unix implementation, sincecleanupruns aftercatch_unwind(main). For data owned by the OS, it again can be affected by unsafe-platform-specific code that mutates this data directly, but again I argue that such programs already have poorly-specified behavior.
Matt Brubeck at 2018-01-03 19:34:42
-
Incidentally, I was just reading http://andrewkelley.me/post/zig-december-2017-in-review.html which says
It's really a shame that Windows command line parsing requires you to allocate memory.
Ah, yes. On Windows the
Argsiterator constructor callsCommandLineToArgvWwhich allocates an array of pointers and a single UTF-16 buffer to hold a copy of the args. So while it doesn't allocate and clone each arg individually, it does do 1 or 2 allocations, and copies the whole command line as UTF-16.We definitely can't get to zero copies on Windows, because we at least need to do UTF-16 to UTF-8 conversion.
Matt Brubeck at 2018-01-03 19:55:28
#47165 eliminates the allocations/copies on startup.
Matt Brubeck at 2018-01-03 20:55:11
Triage: fixed by https://github.com/rust-lang/rust/pull/47165
And specifically not related to macOS: @rustbot label -O-macos
Mads Marquart at 2024-08-21 21:27:39