std::process::Command hangs if piped stdout buffer fills

5654cbb
Opened by Fraser Hutchison at 2023-12-03 19:56:41

There's an issue on Windows when using Stdio::piped() for stdout. If the child process writes too many bytes to stdout, it appears to get permanently blocked. A minimal test case (echo 2049 blank lines to cmd) is:

fn main() {
    ::std::process::Command::new("cmd")
        .args(&["/c", "for /l %i in (1, 1, 2049) do @echo["])
        .stdout(::std::process::Stdio::piped())
        .status();
}

This hangs for me, but succeeds if only 2048 echos are specified (which is why I'm suspecting it's related to filling the 4096-byte buffer).

  1. So the issue is that .status() does not read from piped stdout/stderr, causing it to get blocked?

    Peter Atashian at 2017-10-27 08:32:50

  2. I'm guessing that's the case, yes. Or it's trying to read and failing, but I haven't looked into the code there, so not really sure if that's even possible.

    Fraser Hutchison at 2017-10-27 08:45:53

  3. I wonder if the simple fix is to have status silently convert piped to null for stdout and stderr.

    Jack O'Connor at 2018-09-19 14:40:26

  4. I believe I've hit the same issue on Ubuntu 20.04.2. Given

    # Cargo.toml
    
    [package]
    name = "sandbox-rs"
    version = "0.1.0"
    authors = ["Heliozoa"]
    edition = "2018"
    
    // src/bin/printer.rs
    
    fn main() {
        for _ in 0..65536 {
            print!("a");
        }
    }
    
    // src/main.rs
    
    fn main() {
        let mut command = std::process::Command::new("./target/debug/printer");
        let _res = command
            .stdout(std::process::Stdio::piped())
            .status()
            .unwrap();
    }
    

    the process hangs when running cargo build --bin printer && time cargo run --bin sandbox-rs. Changing the status call to output or reducing the amount of iterations in printer.rs to 65535 fixes the problem. It's pretty niche and I only came across this when debugging another problem and trying different things out, but for what it's worth I expected this to work without issues, mainly because everything seems perfectly fine until you hit the limit.

    Martinez at 2021-02-18 20:46:36

  5. I have the same problem in Ubuntu 20.10

    rustc --version: rustc 1.52.1 (9bc8c42bb 2021-05-09) cargo --version: cargo 1.52.0 (69767412a 2021-04-21)

    I have a random c++ generator that generates between 1e5 and 2e5 numbers in c++17

    #include <bits/stdc++.h>
    using namespace std;
    template <typename T>
    T random(const T from, const T to) {
        static random_device rdev;
        static default_random_engine re(rdev());
    
        using dist_type = typename conditional<
            is_floating_point<T>::value,
            uniform_real_distribution<T>,
            uniform_int_distribution<T>
        >::type;
    
        dist_type uni(from, to);
        return static_cast<T>(uni(re));
    }
    int main() {
        int n = random<int>(1e5, 2e5);
        cout << n << endl;
        for(int i=0;i<n;++i) cout << random<int>(1, 1e9) << " ";
        cout << endl;
        return 0;
    }
    

    and what I am doing is running it from rust using a pipe for the generator output data

    // compile
    Command::new("g++")
        .arg("-std=c++17")
        .arg("-o")
        .arg("app.o")
        .arg("main.cpp")
        .status()
        .expect("Compiling C++ error");
    
    // timeout 10sec
    let timeout = 10000;
    let now: Instant = Instant::now();
    
    // run generator
    let child = Command::new("app.o")
             .stdout(Stdio::piped())
             .stderr(Stdio::piped())
             .spawn()
             .unwrap();
    
    // output file
    let stdout = Some(PathBuf::from("output.txt"));
    
    let output_file: Option<Arc<Mutex<File>>> = match stdout {
    	Some(file) => {
    	    let output = File::create(stdout).unwrap();
    	    Some(Arc::new(Mutex::new(output)))
    	},
    	_ => None,
    };
    
    // check timeout
    let thread: std::thread::JoinHandle<()> = std::thread::spawn(move || {
    	for _ in 0..timeout {
    	    if let Ok(Some(_)) = child.try_wait() {
    		if let Ok(response) = child.wait_with_output() {
    		    match output_file {
    		        Some(f) => {
                                // write the output data
    		            let mut file: std::sync::MutexGuard<File> = f.lock().unwrap();
    		            file.write_all(&response.stdout).unwrap();
    		        },
    		        None => (),
    		    }
    		}
    		return;
    	    }
    	    std::thread::sleep(std::time::Duration::from_millis(1));
    	}
    	child.kill().unwrap();
    });
    
    thread.join().unwrap();
    let new_now: Instant = Instant::now();
    let time: Duration = new_now.duration_since(now);
    
    println!("{:?}", time);
    

    I am using .spawn (), but it is blocking and I think it is because it exceeds the capacity of the buffer and it goes out due to the timeout because when I generate data less than 1e3 it works perfectly

    Is there any way to fix this problem? Or it is possible to increase the buffer capacity?

    Luis Miguel Báez at 2021-06-12 16:32:17

  6. @LuisMBaezCo the buffer size you're referring to is what Linux calls the "pipe capacity". This is an OS thing, and not something that Rust or libstd tries to mess with. If you know you want to increase the pipe capacity, you can look into the fcntl(F_SETPIPE_SZ) approach described here and here. (The equivalent can probably be done in safe Rust using the fcntl function provided by the nix crate, and using the file descriptors that you get from ChildStdout::as_raw_fd and ChildStderr::as_raw_fd.)

    However, unless you're sure that's the right approach for you, I think you should avoid relying on pipe capacity. I think the right approach here is almost always to make sure that your parent process continually clears space in the stdout and stderr pipes by reading from them. (This requires at least one helper thread if you're setting both stdout and stderr to piped, unless you want to reach for fancy async IO.) For example, spawning helper threads to continually clear space is the strategy that the duct crate uses. If you're worried that there might be too much output for the parent to fit everything in memory, and the parent can't make use of the output in some sort of streaming fashion, then the next best alternative I'd suggest is to redirect the child's stdout and stderr to files.

    Jack O'Connor at 2021-06-13 01:25:40

  7. ~~The Windows specific part of this issue can be solved by using a larger buffer size. We currently set a relatively small buffer size, at least compared to typical Linux defaults.~~

    The more general issue is harder to solve within the std but maybe it can be mitigated somewhat by improving documentation and providing workarounds where we can. So my suggestions for addressing this issue would be:

    1. ~~Increases the pipe buffer size on Windows (see link above).~~ Implemented in #95782
    2. Document the potential deadlock if the child's stdout buffer is filled without being read and note how the user can avoid this situation.
    3. For the specific case of Stdio::piped() being used for the child's stdout and then Command::status being called, I think it could be automatically replaced with Stdio::null() without changing any user visible behaviour (except avoiding deadlocks). Though I'm not totally sure about this.

    @rustbot label +E-help-wanted

    Chris Denton at 2022-04-04 16:50:20