parent() returns Some("") for single-component relative paths

0f0fa69
Opened by Jack O'Connor at 2023-12-17 07:39:51
use std::path::Path;

fn main() {
    println!("{:?}", Path::new("/").parent());  // None
    println!("{:?}", Path::new(".").parent());  // Some("")
    println!("{:?}", Path::new("foo").parent());  // Some("")
}

The latter two cases feel weird to me. Some("") by itself is kind of a contradiction, on the one hand saying "yes there is a parent" and on the other hand returning an invalid path that really means "no there isn't actually a parent." We've also tried to avoid creating empty path components in other cases, like the double-slash case Path::new("a//b").parent(), which returns Some("a") rather than Some("").

For consistency with /, it probably makes sense to have the parent of . be None. For foo I could imagine going either of two ways, either Some(".") or None. If folks agree that one of those options would be nice in theory, then I guess the second question is whether a behavior change here would break backwards compatibility too much to consider doing it. Would it make sense for me to put together a PR and then ask for a Crater run or something like that?

  1. Technically a parent of . is a .., but we cannot return that due to oversights in the design of the function (returns Path, not PathBuf/Cow<Path>)?

    Simonas Kazlauskas at 2016-09-30 16:06:16

  2. cc @aturon

    These sorts of subtle details in behavior of path are notoriously difficult and often have surprising results. One helpful comparison is to see what a bunch of other path libraries do as well in situations like this.

    Alex Crichton at 2016-09-30 19:33:06

  3. Python is inconsistent between os.path and pathlib.Path, so unfortunately "doing what Python does" isn't really an option. The main consistent thing between their two implementations, which is notably different from Rust, is that the parent relationship is circular once you get to the bottom of it. On the question of "is the empty string the parent of anything?" they give different answers. Here's some more detail:

    os.path.dirname

    os.path.dirname("/") == "/"
    os.path.dirname("/foo") == "/"
    os.path.dirname("/foo/") == "/foo"
    os.path.dirname("") == ""
    os.path.dirname("foo") == ""
    os.path.dirname("foo/") == "foo"
    os.path.dirname("foo/bar") == "foo"
    os.path.dirname("./foo") == "."
    

    Looking at the implementation in CPython, it seems to be

    1. drop everything to the right of the right-most separator (or just everything, if there are no separators)
    2. rstrip any remaining separators off of what's left, unless the remaining string is composed entirely of separators

    The "unless" in (2) is how they maintain dirname("/") == "/", though it also has the weird consequence that dirname("//") == "//" and so on.

    Summary: not very smart about trailing separators or dots, postpones normalizing things until it needs to, willing to give you an empty string, eventually circular if you call it enough times

    pathlib.Path.parent

    Path("/").parent == Path('/')
    Path("/foo").parent == Path('/')
    Path("/foo/").parent == Path('/')
    PureWindowsPath("C:\\foo").parent == PureWindowsPath("C:\\")
    PureWindowsPath("C:\\").parent == PureWindowsPath("C:\\")
    Path("") == Path('.')
    Path("").parent == Path('.')
    Path(".").parent == Path('.')
    Path("foo").parent == Path('.')
    Path("foo/").parent == Path('.')
    Path("foo/./././").parent == Path('.')
    Path("./foo").parent == Path('.')
    

    Pathlib tries to be smarter. It drops all duplicated separators, trailing separators, and single dots when paths are constructed. Unless the path is (or would be) empty, in which case it becomes a single dot.

    Summary: smart about separators and dots, aggressive about normalization, never willing to give you an empty string, also eventually circular

    Jack O'Connor at 2016-09-30 20:13:29

  4. On the question of returning Some(".") vs None, I can imagine a few different invariants we could decide to maintain:

    • If a path exists on the filesystem, and its parent is Some, then its parent exists on the filesystem too. Both sides maintain this. (Though libstd's current behavior violates this.)
    • If a path refers (or could refer) to a file, then its parent will always be Some. Only the Some(".") approach maintains this.
    • If a path's parent is Some, the parent is a prefix of it. Only the None approach maintains this.

    That last invariant, the prefix one, is important for non-empty parents in most cases, since we don't want parent to have to allocate a PathBuf. But in this specific case we can return "." as a &'static Path, so we can get away with breaking it. Would a caller care about getting a prefix back for any other reason?

    One possible issue might be if we ever need to support a weird platform where "." is not actually a valid path to "foo"'s parent directory. Though maybe in that case we would define some new per-platform constant, and tweak the definition of parent to say that it might return whatever that thing is?

    Jack O'Connor at 2016-09-30 20:56:35

  5. Oh gosh, yet another consideration: Components is aware of leading dots, and because of that the parent of Some ("./foo") is currently Some("."), even though single dots are skipped elsewhere. So we have to choose between "consistency across different forms of the same relative path" and "consistency between the parent method and the components list".

    Jack O'Connor at 2016-09-30 23:03:04

  6. @nagisa We can return Path("..") in particular, just sticking ".." into a static(!).

    bluss at 2016-10-01 06:53:37

  7. @bluss can you return ../../ which is a parent of .. though? and ../../../ afterwards etc.

    Simonas Kazlauskas at 2016-10-01 11:51:08

  8. Maybe .parent() can return None when there is no meaningful substring parent (because the path is relative), and there could be a seperate method ,relative_parent() (name bikeshedding very welcome) that returns a Cow<Path>?

    Jack Fransham at 2016-10-04 12:43:23

  9. That PR (#40447) cause at least one break in the Rust build itself, noted in the comments there. I might try the "synthetic ." approach at some point, though I'm not sure that will be any less breaky.

    Jack O'Connor at 2017-03-13 04:31:55

  10. The only thing we can really do is the relative_parent addition, I think, since changing what the existing method returns seems impractical (hard to catch and fix changed uses).

    Mark Rousskov at 2017-06-17 01:52:31

  11. Triage: not aware of any movement on adding relative_parent

    Steve Klabnik at 2019-09-24 01:59:10

  12. Another tricky example of the current behavior to think about for whoever decides to wrestle with this dragon:

    Path("a/b/c/..").parent() == Some(Path("a/b/c"))
    

    That's wacky because if you canonicalize both of those, it's saying that the "parent" of a/b is a/b/c.


    Thinking more about @nagisa's comment, it could be possible to have an internal global Vec<PathBuf> behind some kind of spinlock. When the path library needs something like ../../.., it could look in that vec, and add entries up to the needed length if they're not already added. We could construct a &Path using unsafe code, because the memory address of the PathBuf contents is stable.

    This raises the question of whether .parent() would ever bottom out at None for a relative path. If the answer is "no", that would create a weird split in the behavior of absolute and relative paths. It would also mean that that string storage would be unbounded in pathological cases.


    Looking at all of this together, it seems like the notion of "parent" is kind of meaningless without talking to the filesystem. This is extra true when symlinks get involved, at which point even the filesystem has two different notions of what the parent directory is (see pwd -L and pwd -P). If I were going to write a path library from the ground up, I'd probably want a clean separation between logical operations that work only on path strings in memory, and physical operations that ask the filesystem for canonical answers. (That said, I've personally seen cases where system APIs like canonicalize return errors, like VirtualBox shared drives on Windows guests. Failures in basic path queries aren't just a theoretical problem, unfortunately.) A possible set of operations might be:

    • physical_parent (resolves .. via the filesystem)
    • logical_parent (resolves .. syntactically, uses $PWD to resolve leading .. in relative paths)
    • dirname (strips the last path component off)

    Jack O'Connor at 2019-09-24 20:09:59

  13. Also more fun is Windows with C:\foo\.. vs \\?\C:\foo\... Because .. is resolved by the win32 subsystem and not the filesystem, the former refers to C:\ while the latter actually refers to the literal entry .. in C:\foo. So you need to look at the path prefix to understand whether .. is referring to the parent directory or not.

    Peter Atashian at 2019-09-25 04:20:08

  14. Is there a way to detect this case that is cleaner than path.parent() == Some(Path::new(""))?

    Edit: I found parent().as_os_str().is_empty()

    Cameron Steffen at 2020-08-06 22:11:10

  15. I am currently using .parent() to differentiate between a relative filename ("./a" => Some(".")) and a passed command ("a" => Some("")) probably returning None in the second case would make sense for .dirname()) I think.

    Roland Fredenhagen at 2021-04-01 16:45:02