std::fs::canonicalize returns UNC paths on Windows, and a lot of software doesn't support UNC paths
Hi, I hope this is the right forum/format to register this problem, let me know if it's not.
Today I tried to use std::fs::canonicalize to make a path absolute so that I could execute it with std::process::Command. canonicalize returns so-called "UNC paths", which look like this: \\?\C:\foo\bar\... (sometimes the ? can be a hostname).
It turns out you can't pass a UNC path as the current directory when starting a process (i.e., Command::new(...).current_dir(unc_path)). In fact, a lot of other apps will blow up if you pass them a UNC path: for example, Microsoft's own cl.exe compiler doesn't support it: https://github.com/alexcrichton/gcc-rs/issues/169
It feels to me that maybe returning UNC paths from canonicalize is the wrong choice, given that they don't work in so many places. It'd probably be better to return a simple "absolute path", which begins with the drive letter, instead of returning a UNC path, and instead provide a separate function specifically for generating UNC paths for people who need them.
Maybe if this is too much of an incompatible change, a new function for creating absolute paths should be added to std? I'd bet, however, that making the change to canonicalize itself would suddenly make more software suddenly start working rather than suddenly break.
canonicalizesimply asks the kernel to canonicalize the path, which it does and happens to return the canonical path as a root local device path. Root local device paths are capable of representing some paths which normal absolute paths are incapable of representing accurately (such as components being named ".." or "." or having "/" in them), along with the fact that they're the only way to call many system functions with paths longer thanMAX_PATH(aside from being on some recent version of Windows 10 and having a certain registry key enabled). As a result havinglibstdautomatically strip the prefix would definitely break some situations. However I'm definitely in favor of having more support inlibstdfor converting between these different kinds of paths so the user can easily turn a root local device path into an absolute path. I'd also love to have anfs::normalizewhich merely normalizes a possibly relative path into an absolute path without hitting the filesystem on Windows.Peter Atashian at 2017-06-23 23:01:58
In reference to your commit which referenced this PR, normalization is not the same as merely joining the path onto the current directory due to drive relative paths being relative to the current directory on the given drive. For example given a drive relative path of
C:foo, and anenv::current_dir()ofD:\bar, normalizingC:foowill have to get the current directory forC:\and could end up being normalized to something radically different such asC:\i\dont\even\foo.Peter Atashian at 2017-06-24 06:41:31
thanks, @retep998 :) it's just a hacked-together build tool that probably will eventually be replaced with something else, and I didn't intend to notify this ticket about my commit. but I guess it goes to show that a good way to get an absolute path in std would be really helpful.
Christopher Armstrong at 2017-06-24 06:44:46
Command::current_dirshould be fixed. I doubt we will change canonlicalize.Note, the i-wrong tag is only for the
Command::current_dir, not thecanonicalizebehaviour.Simonas Kazlauskas at 2017-06-25 20:40:49
Quick testing on Windows 10.0.15063 indicates that both
SetCurrentDirectoryWandCreateProcessWare okay with a current directory starting with\\?\. They are not okay with a current directory that exceedsMAX_PATHregardless of\\?\.CreateProcessWis okay with the path to the process itself starting with\\?\regardless of whether the first parameter is used.CreateProcessWis only okay with the path to the process exceedingMAX_PATHif it starts with\\?\and is specified as the first parameter which Rust does not currently use. I testedstd::process::Command::current_dirand it works as expected, accepting paths starting with\\?\but rejecting any paths exceedingMAX_PATH.Peter Atashian at 2017-06-25 23:11:33
Technically, AFAIK it is safe to strip the prefix in common simple cases (absolute path with a drive letter, no reserved names, shorter than max_path), and leave it otherwise.
So I think there's no need to compromise on correctness as far as stdlib goes. The trade-off is between failing early and exposing other software that doesn't support UNC paths vs maximizing interoperability with non-UNC software.
In an ideal world, I would prefer the "fail early" approach, so that limitations are quickly found and removed. However, Windows/DOS path handling has exceptionally long and messy history and decades of Microsoft bending over backwards to let old software not upgrade its path handling. If Microsoft can't push developers towards UNC, and fails to enforce this even in their own products, I have no hope of Rust shifting the Windows ecosystem to UNC. It will rather just frustrate Rust users and make Rust seem less reliable on Windows.
So in this case I suggest trying to maximize interoperability instead, and canonicalize to regular paths whenever possible (using UNC only for paths that can't be handled otherwise).
Also, careful stripping of the prefix done in stdlib will be much safer than other crates stripping it unconditionally (because realistically whenever someone runs into this problem, they'll just strip it unconditionally)
Kornel at 2017-09-05 01:31:31
@kornelski I completely agree. The current behavior is unexpected in my opinion.
Ofek Lev at 2017-10-09 05:47:52
I hope this is helpful…
According to Microsoft:
Note File I/O functions in the Windows API convert
/to\as part of converting the name to an NT-style name, except when using the\\?\prefix as detailed in the following sections.Source: https://msdn.microsoft.com/en-us/library/windows/desktop/aa365247(v=vs.85).aspx
And the Ruby language uses forward slashes for File paths and that works on Windows.
Daniel P. Clark at 2017-10-10 23:24:38
I've looked at this problem in detail. There are a few rules which need to be checked to safely strip the UNC prefix. It can be implemented as a simple state machine.
I've implemented that using public APIs, but because
OsStris opaque it's not nearly as nice as stdlib's implementation could have been:https://lib.rs/dunce
So I'm still hoping canonicalize would do it automatically, because if it's done only for legacy-compatible paths there's no downside: all paths work for UNC-aware programs, and all paths that can work for legacy programs work too.
Kornel at 2017-11-22 14:17:54
Another example of this issue that I encountered in https://github.com/alexcrichton/cargo-vendor/pull/71:
url::URL.to_file_path() returns a non-UNC path (even if the URL was initialized with a UNC path). And std::path::Path.starts_with() doesn't normalize its arguments to UNC paths. So calling
to_file_path()on a file: URL and then comparing it to the output ofcanonicalize()viastarts_with()always returns false, even if the two paths represent the same resource:extern crate url; use std::path::Path; use url::Url; fn main() { // Path.canonicalize() returns a UNC path. let unc_path_buf = Path::new(r"C:\Windows\System").canonicalize().expect("path"); let unc_path = unc_path_buf.as_path(); // Meanwhile, Url.to_file_path() returns a non-UNC path, // even when initialized from a UNC path. let file_url = Url::from_file_path(unc_path).expect("url"); let abs_path_buf = file_url.to_file_path().expect("path"); let abs_path = abs_path_buf.as_path(); // unc_path and abs_path refer to the same resource, // and they both "start with" themselves. assert!(unc_path.starts_with(unc_path)); assert!(abs_path.starts_with(abs_path)); // But they don't "start with" each other, so these fail. assert!(unc_path.starts_with(abs_path)); assert!(abs_path.starts_with(unc_path)); }Arguably,
to_file_path()should return a UNC path, at least when initialized with one. And perhapsstarts_with()should normalize its arguments (or perhaps clarify that it compares paths, not the resources to which they refer, and thus does no normalization). Also, the mitigation for consumers of this API is straightforward:canonicalize()all paths you compare if you do so to any of them. So maybe the current behavior is reasonable.Nevertheless, it does feel like something of a footgun, so it's worth at least documenting how it differs from that of some other APIs on Windows.
Myk Melez at 2018-05-09 05:48:24
comparing it to the output of canonicalize() via starts_with() always returns false, even if the two paths represent the same resource:
Comparing canonical paths is a footgun in general because it is the wrong thing to do! Things like hard links and so on mean that such comparisons will never be entirely accurate. Please don't abuse canonicalization for this use case.
If you want to tell whether two paths point to the same file, compare their file IDs! That's what
same-filedoes and it works great!Peter Atashian at 2018-05-09 06:35:50
but
starts_withis not for is-file-a-file comparison, but is-file-in-a-directory check. There are no hardlinks involved (and AFAIK apart from private implementation detail of macOS time machine, no OS supports directory hardlinks).Kornel at 2018-05-09 13:19:45
There are more ways than just
\\?\C:\andC:\to represent the same path, so unfortunately any sort of file in directory check is a hard problem. For example, paths can refer to drives via names other than drive letters.Peter Atashian at 2018-05-09 13:31:23
bind mounts are equivalent to directory hardlinks.
On Wed, May 9, 2018, 16:20 Kornel notifications@github.com wrote:
but starts_with is not for is-file-a-file comparison, but is-file-in-a-directory check. There are no hardlinks involved (and AFAIK apart from private implementation detail of macOS time machine, no OS supports directory hardlinks).
— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/rust-lang/rust/issues/42869#issuecomment-387735754, or mute the thread https://github.com/notifications/unsubscribe-auth/AApc0j0s0uGzUojkJwFP7VFX8RpnGYzMks5twu0RgaJpZM4OEGyt .
Simonas Kazlauskas at 2018-05-09 14:01:43
I'ive been using conditional comipiling when canonicalizing, and not doing it on Windows.
David O'Connor at 2020-05-02 18:13:09
Every time I get back into Rust I hit this, forget I've been here before, Google why canonicalize behaves differently on Windows than all the other languages' stdlib I've used, then return to this issue...
Is there an official way/well known crate to:
- get the normalized, absolute version of a path (no UNC) regardless of its existence, akin to Python's os.path.abspath
- get the normalized, absolute version of a path (no UNC) while resolving all symlinks, akin to Python's os.path.realpath
It's extremely painful to write cross-platform code due to this issue. You can even see the steady stream of bugs linking back to here.
I found:
- https://github.com/rust-lang/rust/issues/59117
- https://github.com/rust-cli/meta/issues/10
but no progress.
Perhaps I should just look at how
ripgrephandles things and do that.Ofek Lev at 2020-05-02 20:30:47
It looks like it should be an innocent path-cleaning fn, but it prepends combos of
/and?on Windows that break things.This feels like something where doing that's the right answer in some cases, but wrong in others. I think the answer is to have it take an enum where you have to explicitly specify whether you want the
//?//bit. If you're not in one of those use cases, you're going to get burned. If you're developing on Linux, someone compiling the code for Windows will get burned.David O'Connor at 2020-05-02 21:17:35
This continues to be both an annoyance, and a source of bugs from oversimplified workarounds.
https://github.com/webdesus/fs_extra/issues/29
Kornel at 2020-05-29 13:00:49
Unless I'm somehow mistaken, it seems that one tool that doesn't support UNC paths is rustc itself: I tried to pass a UNC path to
--externand it claimed the path didn't exist until I trimmed off the\\?\.Tyler Mandry at 2020-08-01 04:17:51
https://github.com/rust-lang/rust/pull/89270
Sean Young at 2021-09-26 13:32:28
Still bugs occurring that link back to this issue...
Ofek Lev at 2022-10-27 18:41:13
When trying to work around the issue from
tauri::app::PathResolver::resolve_resource, which internally usesstd::fs::canonicalize, I found out that the following code:println!("{}", path.starts_with(r"\\?\")); println!("{}", path.to_string_lossy().starts_with(r"\\?\")); print!("{}", path.display()); std::io::stdout().flush().unwrap();gives the following output:
false true \\?\C:\Users\USER\Documents\github\repalungs\tauri-client\src-tauri\target\debug\assets\default.nifti.gzwhich prevents the following workaround
#[cfg(target_os = "windows")] let path: PathBuf = match path.strip_prefix(r"\\?\") { Ok(path) => path.to_path_buf(), Err(_) => path, };from working. I'd really suggest to use
dunce::simplifiedinstead.Dmitrii - Demenev at 2023-03-14 23:26:03
[...] which prevents the following workaround
#[cfg(target_os = "windows")] let path: PathBuf = match path.strip_prefix(r"\\?\") { Ok(path) => path.to_path_buf(), Err(_) => path, };from working. I'd really suggest to use dunce::simplified instead.
@JohnScience That "workaround" is most likely wrong in the first place since
UNC paths can change their meaning if they're reinterpreted as DOS paths.
Consider using https://lib.rs/crates/dunce which strips the prefix only when it is safe to do so.
See https://github.com/webdesus/fs_extra/issues/29 for the source of the citation.
Canonicalizing a path has a specific meaning which
std::fs::canonicalizeimplements. I don't think that should be changed to let users stay ignorant of that. This blog post shows a myriad of ways Go suffers from sacrificing correctness for simplicity and I don't think that is the right approach for Rust.
While the behaviour is correctly documented, it could maybe be made more prominently in the docs. Although 3 out of 5 lines are already dedicated to this behaviour...Tastaturtaste at 2023-03-15 16:12:32
@Tastaturtaste I'm the author of
duncecrate and the bug you're citing. I thinkcanonicalizeshould strip the UNC prefix whenever possible.Naive unconditional stripping of the prefix is wrong, but there are clear stable rules for when it's perfectly safe and correct to do so. I'll reiterate that this is not a sacrifice of correctness. It's purely an improvement in compatibility — paths that can't be losslessly converted to classic DOS paths can stay as UNC.
And compatibility improvement is unfortunately needed. Even Microsoft's own MSVC does not support UNC paths. Given slow-moving backwards-looking nature of Windows, the situation is unlikely to improve, so Rust is just stuck with an incompatible oddity.
Kornel at 2023-03-15 23:36:41
Has there been any update on potentially adding the correct functionality to the standard library?
Ofek Lev at 2023-09-07 22:37:01
There is now the (currently unstable) std::path::absolute that turns paths in to an absolute form without adding
\\?\.canonicalizeis more tricky. It returns whatever is returned by the OS so if it's\\?\prefixed then it'll need to be manually converted but only if the conversion is not lossy. However, deciding whether or not a path can round trip through a win32 path is not trivial and may change between versions of Windows. The only truly reliable way is to get Windows to actually round-trip it and see if you end up with the same path. There's also the tricky issue of changing long established behaviour. The\\?\prefix may break some people but others may be relying on it (e.g. people wanting a path that will bypass max path limits).Probably the best way to make progress here is to have a function that attempts to perform the conversion. This would also be more generally useful.
Chris Denton at 2023-09-08 00:06:38
deciding whether or not a path can round trip through a win32 path is not trivial
Yes, but the rules are documented.
and may change between versions of Windows
This is extremely unlikely to change in an incompatible way, because the documented limits of these paths exist for backward compatibility with legacy software that may have them hardcoded (e.g. binaries may have 260-byte long buffers for paths).
Windows can relax some of the limits, e.g. some win32 APIs can be configured to accept more than 260 bytes. Maybe someday
CONwill be a legal name everywhere. But even if Windows relaxes the limits, Rust sticking to the old limits won't create any problems. At worst it will unnecessarily use UNC paths in some edge cases. Currently it uses UNC paths every time, so it literally can't be any worse than it is now.Kornel at 2023-09-08 22:21:45
it literally can't be any worse than it is now
I would like to echo this sentiment, it is not hyperbole https://twitter.com/mitsuhiko/status/1699381180536086714
The vast majority of code that deals with file paths and cares about Windows either uses https://docs.rs/dunce/ or has custom logic to strip the first UNC characters like https://github.com/sharkdp/fd/pull/72/files. Code that doesn't do one of these yet either will eventually when people open bug reports (see the dozens of issues that link to this issue) or will produce paths that cannot be used by everything else and cannot often be considered to support Windows.
Normally I wouldn't follow issues for this long because I have a great number of work, FOSS and personal matters that fill my time nearly completely but I happen to be a Windows user and this issue affects us. So, I will keep watching the developments in hope of a fix...
Ofek Lev at 2023-09-08 23:00:29
Yes, but the rules are documented.
Could you provide a link? I know of people reverse engineering the current algorithm but I'm not aware of Microsoft docs that describe the complete lossy transformation.
Chris Denton at 2023-09-09 01:36:29
- https://learn.microsoft.com/en-us/windows/win32/fileio/naming-a-file
- https://learn.microsoft.com/en-us/windows/win32/fileio/maximum-file-path-limitation
- https://learn.microsoft.com/en-us/windows/win32/api/fileapi/nf-fileapi-getfullpathnamea
edit: added link
Ofek Lev at 2023-09-09 02:50:37
I'm aware of those links. The first one is the most relevant and has had more details added over its lifetime but still does not document the path parsing algorithm in its entirety. I still think the most robust solution is to attempt to round-trip the path and see if it changes.
Chris Denton at 2023-09-09 03:06:23
It's more than 5 year now, do we have a reasonable solution?
Alan Silva at 2024-01-08 12:48:04
We now have
std::path::absolutewhich will work better if you don't need to resolve symlinks. We do need to work towards stabilizing it however.Removing the
\\?\prefix still requires a third party crate. We could add a windows-only function toPaththat attempts to remove it. We would however need to guard against changing the meaning of the path (which can happen in certain edge cases).Chris Denton at 2024-01-08 16:11:05
Thanks for that but it's still extremely frustrating. No matter what I try I still get:
\\?\UNC\applications2...In my csv output file.
Using:
let file_path = path::absolute(path)?.to_string_lossy().to_string();The expected value is
\\applications2\...(it's a Windows network drive).I tried
dunceas well.Alan Silva at 2024-01-08 17:02:36
What's the input path?
absoluteshouldn't add\\?\UNC\unless it's already in the input path (or in the current directory). I thinkduncehas issues with some paths, you could tryomnipathinstead.Chris Denton at 2024-01-08 17:10:19
It's like
H:/opensight/whereH:is mapped to\\applications2\sousaa.Alan Silva at 2024-01-08 17:16:10
Hm, using
std::path::absoluteshould resolve that asH:\opensight\because it doesn't look up where it's mapped to.Chris Denton at 2024-01-08 17:21:00
You're right, it worked, it just that I used the wrong binary. Many thanks!
Alan Silva at 2024-01-08 17:31:12
Clearly the current behavior of std::fs::canonicalize is problematic, as evidenced by the comments and issues above. It seems to me that dunce does more or less what people expect std::fs::canonicalize to do on Windows.
The logic used by dunce seems preferrable to what std::fs::canonicalize currently does in every case that I can think of. Dunce seems to have no outstanding issues, is widely used and trusted and should be compatible with all supported versions of Windows, so my simple proposal is this: why not upstream the logic used by dunce into std::fs::canonicalize? Is there any concrete objection to this?
Juhan Oskar Hennoste at 2025-01-08 02:42:07
I do think we should provide an equivalent to
duncein the standard library. Whetherstd::fs::canonicalizeshould use it by default, I'm not entirely sure.That said, I still insist that most people do not need
std::fs::canonicalizein the first place, and would be much better off usingstd::path::absoluteorsame-filedepending on their use case.Peter Atashian at 2025-01-08 09:16:30
I do think we should provide an equivalent to
duncein the standard library. Whetherstd::fs::canonicalizeshould use it by default, I'm not entirely sure.I think having std::fs::canonicalize and dunce::canonicalize as separate functions in the standard library would just cause confusion for no extra value. What use case would prefer the current std::fs::canonicalize behavior over dunce::canonicalize? I can't think of a single one.
That said, I still insist that most people do not need std::fs::canonicalize in the first place, and would be much better off using std::path::absolute or same-file depending on their use case.
Maybe, but I think there are still valid and important use cases where std::fs::canonicalize/dunce::canonicalize is the right tool for the job. For example if I want to create some kind of lookup table where a file path is the key.
Juhan Oskar Hennoste at 2025-01-08 13:09:33
The problem is that people may be intentionally relying on
canonicalizeto produce\\?\style paths to bypass path length restrictions in the Windows API (or to convert to NT-style paths). So changing this now is a breaking change.Solutions:
- As retep998 says,
std::path::absoluteis better for many purposes - For when resolving all symlinks is needed,
canonicalizeis the right tool. Instead of changing the behaviour for everyone we could have a new method that converts any\\?\style path to its Win32 equivalent (if possible). This does have the advantage of being more versatile. E.g. you can call this method on paths received from other applications, not just when usingcanonicalize.
Chris Denton at 2025-01-08 13:35:46
- As retep998 says,
For when resolving all symlinks is needed,
canonicalizeis the right tool. Instead of changing the behaviour for everyone we could have a new method that converts any\\?\style path to its Win32 equivalent (if possible). This does have the advantage of being more versatile. E.g. you can call this method on paths received from other applications, not just when usingcanonicalize.That's just not going to work:
https://gitlab.com/kornelski/dunce/-/issues/3#note_1096103063
Teoh Han Hui at 2025-01-08 13:41:53
The length limit is not a problem. The UNC prefix can be kept for paths that exceed PATH_MAX or other limits.
The
duncecrate has been doing this successfully for 7 years, and has millions of download per month.Kornel at 2025-01-08 13:44:10
The problem is that people may be intentionally relying on
canonicalizeto produce\\?\style paths to bypass path length restrictions in the Windows API (or to convert to NT-style paths). So changing this now is a breaking change.As kornelski pointed out, the path length limit is not a problem, because dunce will automatically preserve the UNC prefix if it is needed because of the path length limit.
People who rely on std::fs::canonicalize to convert to NT-style paths are in my opinion misusing std::fs::canonicalize for a purpose that it is not intended for. While the documentation does state that it will convert to extended length path syntax, I have always interpreted that as an implementation detail rather than a strong guarantee.
For when resolving all symlinks is needed,
canonicalizeis the right tool. Instead of changing the behaviour for everyone we could have a new method that converts any\\?\style path to its Win32 equivalent (if possible). This does have the advantage of being more versatile. E.g. you can call this method on paths received from other applications, not just when usingcanonicalize.Having this as a separate method could be useful, but I still strongly believe that std::fs::canoncalize should do this conversion automatically (when it is safe to do so, as dunce does). If it needs to be done manually then people that want to use std::fs::canonicalize in cross-platform code will have to manually add the extra function call to strip UNC prefixes on Windows and will have to know and remember to do that every time (an easy thing to miss if you primarily test on Linux or Mac).
I think it is impossible to overstate that support for UNC paths on Windows is really spotty. Many programs will just refuse to work with UNC paths. You want to avoid UNC paths if at all possible.
Juhan Oskar Hennoste at 2025-01-08 14:01:46
I think it is impossible to overstate that support for UNC paths on Windows is really spotty. Many programs will just refuse to work with UNC paths. You want to avoid them if at all possible.
I don't disagree with that. Hence why
readlink, for example, does attempt the conversion.But changing the documented behaviour of
canonicalizeis a breaking change and we have no way to gauge its impact. I've been (unintentionally) responsible for breaking other people's code before. I don't like it.Chris Denton at 2025-01-08 14:12:32
As kornelski pointed out, the path length limit is not a problem, because dunce will automatically preserve the UNC prefix if it is needed because of the path length limit.
It is a breaking change. Users can push new path components to the path after canonicalization and then use that new path. I've done this a few times in some of my projects. At least the tar crate also seems to do this.
As a side note, I'm also a bit concerned about dunce's ability to implement the specified path parsing. The documentation has changed in the past: Old: https://web.archive.org/web/20220920223716/https://learn.microsoft.com/en-us/windows/win32/fileio/naming-a-file Current: https://learn.microsoft.com/en-us/windows/win32/fileio/naming-a-file LPT¹, LPT², and LPT³, which were previously not reserved, became reserved. The docs also clarified what they mean by "followed immediately by an extension". This clarification shows that
duncedoesn't implement reserved file handling correctly:fn main() { let cwd = std::env::current_dir().unwrap(); let cwd = cwd.canonicalize().unwrap(); dbg!(&cwd); let test_file = cwd.join("nul.tar.gz"); std::fs::write(&test_file, "Hello World!").unwrap(); let canonical_test = test_file.canonicalize().unwrap(); dbg!(&canonical_test); dbg!(std::fs::read_to_string(canonical_test).unwrap()); let dunce_test = dunce::canonicalize(test_file).unwrap(); dbg!(&dunce_test); dbg!(std::fs::read_to_string(dunce_test).unwrap()); }[src/main.rs:4:5] &cwd = "\\\\?\\C:\\Users\\natha\\Desktop\\html\\dunce-test" [src/main.rs:11:5] &canonical_test = "\\\\?\\C:\\Users\\natha\\Desktop\\html\\dunce-test\\nul.tar.gz" [src/main.rs:12:5] std::fs::read_to_string(canonical_test).unwrap() = "Hello World!" [src/main.rs:15:5] &dunce_test = "C:\\Users\\natha\\Desktop\\html\\dunce-test\\nul.tar.gz" [src/main.rs:16:5] std::fs::read_to_string(dunce_test).unwrap() = ""So, using
duncecan in some cases cause issues where there weren't any before. I also don't think there's anything stopping something similar from happening in the future.Also just my 2 cents, but I think making
canonicalizereturn a different path type based on whether it can safely de-UNC path would also be confusing. UNC and DOS paths can have different semantics for path components. Also, on some computers where a project or data directory is deeply nested, acanonicalizecall on it could return a UNC path, while someone else's computer could return a normal path. This could make pushing components on that path and creating files work on one person's computer, but fail on another's. While I think this potential confusion is fine for a crate, I'd expect a stdlib api to be more predictable. I would also prefer if attempting to de-UNC a path was a manual operation. I may be a bit biased though, I only rarely run into issues with UNC paths and I just reach forduncewhen I do.As a side note, Windows 11 seems to not have reserved file names, but I can't find the documentation for this behavior.
nathaniel-daniel at 2025-01-08 17:31:28
So, using
duncecan in some cases cause issues where there weren't any before. I also don't think there's anything stopping something similar from happening in the future.The other alternative crate
normpaththat has been suggested before in this thread callsGetFullPathNameWdirectly:https://github.com/dylni/normpath/blob/d65453fdb39ee4091846732477975fb665b1a7dd/src/windows/mod.rs#L143
Teoh Han Hui at 2025-01-08 17:36:41
So, using
duncecan in some cases cause issues where there weren't any before. I also don't think there's anything stopping something similar from happening in the future.The other alternative crate
normpaththat has been suggested before in this thread callsGetFullPathNameWdirectly:https://github.com/dylni/normpath/blob/d65453fdb39ee4091846732477975fb665b1a7dd/src/windows/mod.rs#L143
Unfortunately, this does not seem to be a suitable replacement in general for a canonicalization operation: https://learn.microsoft.com/en-us/windows/win32/api/fileapi/nf-fileapi-getfullpathnamew "This function does not verify that the resulting path and file name are valid, or that they see an existing file on the associated volume."
I also haven't tested, but is this not the same as std::path::absolute? "On Windows, for verbatim paths, this will simply return the path as given. For other paths, this is currently equivalent to calling GetFullPathNameW."
nathaniel-daniel at 2025-01-08 17:41:57
"This function does not verify that the resulting path and file name are valid, or that they see an existing file on the associated volume."
Yes, but...
https://github.com/dylni/normpath/blob/d65453fdb39ee4091846732477975fb665b1a7dd/src/windows/mod.rs#L149
I guess what most people actually want is sort of a combination of what
canonicalizedoes for POSIX + whatabsolutedoes for Windows? (Or in other words, "canonicalizebut no UNC paths on Windows please".) Hence the popularity of the crates...Teoh Han Hui at 2025-01-08 17:45:26
"This function does not verify that the resulting path and file name are valid, or that they see an existing file on the associated volume."
Yes, but...
https://github.com/dylni/normpath/blob/d65453fdb39ee4091846732477975fb665b1a7dd/src/windows/mod.rs#L149
I guess what most people actually want is sort of a combination of what
canonicalizedoes for POSIX + whatabsolutedoes for Windows? (Or in other words, "canonicalizebut no UNC paths on Windows please".) Hence the popularity of the crates...Canonicalization is a lot more than confirming the file exists. You also need to resolve links, which I don't think GetFullPathNameW will do. I think most people probably want std::path::absolute, but I don't really want to generalize.
nathaniel-daniel at 2025-01-08 17:54:53
Users can push new path components to the path after canonicalization and then use that new path.
If you have a legacy path, and push a component to it that makes it exceed the legacy path length limit, then it's not longer a valid path. In a sense,
push()can create a syntax error by continuing to use the legacy path syntax even when the paths cease being representable in the legacy path syntax.Kornel at 2025-01-12 22:14:04
I've advocated for having a smarter
push()before that could do things like transform/into\if you're a linux user who thinks writing.push("foo/bar")is correct code only for it to fail on windows when a UNC path is involved.Peter Atashian at 2025-01-13 00:09:54
I've advocated for having a smarter
push()before that could do things like transform/into\Indeed and that was implemented:
use std::path::PathBuf; let mut p = PathBuf::from(r"\\?\server\share"); p.push("a/b"); println!("{}", p.display()); // prints `\\?\server\share\a\b`Chris Denton at 2025-01-13 00:15:21
Users can push new path components to the path after canonicalization and then use that new path.
Maybe this is a bug in
PathBuf::push?push()could be expected to add the UNC prefix (and appropriate component normalization) when it's necessary to keep the longer path working.If you have a legacy path, and push a component to it that makes it exceed the legacy path length limit, then it's not longer a valid path. In a sense,
push()can create a syntax error by continuing to use the legacy path syntax even when the paths cease being representable in the legacy path syntax.I don't think a legacy path -> UNC path conversion is possible using just the
pushfunction.This conversion requires resolving links, since if the old legacy path used links it will break if it becomes a UNC path. Those link components would be interpreted verbatim instead of resolving the link. This means a conversion like this would have to check the filesystem, a fallible operation. A
pushoperation is expected to be infallible due to its signature. I also don't think it would be advisable to have apushoperation have a chance to hit the filesystem.Maybe introducing a new
smart_pushfunction or something would work, but that's its own can of worms and doesn't resolve the backwards compatibility issue here.Also, this conversion may not even be needed in some cases. There's an option to allow legacy paths to exceed MAX_PATH, though its a bit convoluted. Cargo seems to use this option. However, like with UNC paths, these longer paths are not compatible with all APIs. I would also think that they would fail if passed to a non-long-path-aware program.
nathaniel-daniel at 2025-01-13 00:38:14
This conversion requires resolving links, since if the old legacy path used links it will break if it becomes a UNC path. Those link components would be interpreted verbatim instead of resolving the link.
Can you give an example? If you're talking about symbolic links, those resolve fine in
\\?\paths.This works even in nontrivial cases with multiple symlinks in the same path, links to both files and directories, and where the symlinks themselves represent their targets in ways
\\?\paths themselves do not allow. As shown there, the symlinks' targets can even be relative and contain..components, even though\\?\paths must be absolute and any..components in the\\?\paths themselves would be treated literally.For symlinks, as well as any other kind of reparse point, treating it literally in the sense that is relevant to
\\?\path interpretation is completely consistent with, and does not preclude, following the link. Those two concepts are independent.But it may be that I am just not understanding what you're describing. I thought of symlinks, but you may mean something else by "links".
Eliah Kagan at 2025-01-13 01:28:05
This conversion requires resolving links, since if the old legacy path used links it will break if it becomes a UNC path. Those link components would be interpreted verbatim instead of resolving the link.
Can you give an example? If you're talking about symbolic links, those resolve fine in
\\?\paths.This works even in nontrivial cases with multiple symlinks in the same path, links to both files and directories, and where the symlinks themselves represent their targets in ways
\\?\paths themselves do not allow. As shown there, the symlinks' targets can even be relative and contain..components, even though\\?\paths must be absolute and any..components in the\\?\paths themselves would be treated literally.For symlinks, as well as any other kind of reparse point, treating it literally in the sense that is relevant to
\\?\path interpretation is completely consistent with, and does not preclude, following the link. Those two concepts are independent.But it may be that I am just not understanding what you're describing. I thought of symlinks, but you may mean something else by "links".
No, I was definitely thinking about symlinks. I even remember testing this behavior before my first post here using
mklink /D link dir. But I can't replicate what I did. I think I'm simply mistaken, somehow. Thanks for catching that.Maybe
pushcould be smarter here after all. That would be nice.nathaniel-daniel at 2025-01-13 02:14:56
The benefit of having an improved
pushimplementation is rather inconsequential as the issue is mainly about interoperability and expectations. Even something as simple as displaying a resolved path to the human, non-Windows expert user is problematic.
Has there ever been a technically-breaking change that occurred in a minor Rust release because it was considered a bug fix or unintended behavior?
Ofek Lev at 2025-01-13 02:35:02
The benefit of having an improved
pushimplementation is rather inconsequential., the issue is interoperability. Even something as simple as displaying a resolved path to the human, non-Windows expert user is problematic.The benefit of a smarter push would be that it would behave like normal for most paths. Only when it detects a situation where pushing a component would create a path the end user couldn't use as-is, it would convert it to UNC. So theoretically, nothing would change for cases where things aren't already broken and broken things might work in more cases.
That isn't to say it's without issues. As an example, perhaps the application is long path aware and the UNC conversion isn't necessary. Or maybe the path is being sent to a long path aware application. Or like you said, maybe its simply for display and users may be confused by the prefix (though I'd argue showing a UNC path anyways in case they decide to copy/paste it in another non-long-path-aware program and wonder why it didn't work).
Also, a smarter push would make it easier to rationalize de-UNC-ing automatically within
canonicalize.Has there ever been a technically-breaking change that occurred in a minor Rust release because it was considered a bug fix or unintended behavior?
Rust has dropped breaking changes in minor releases in the past, though that's at a language-level. I don't recall any stdlib level, but I would guess that they exist. There was an issue with changing the layout of an stdlib type, but I consider the libraries to be at fault and the change was pretty well coordinated with a lot of warning. I think the idea is to minimize breaking changes and their impact.
nathaniel-daniel at 2025-01-13 03:06:56