A surprisingly arcane little Unix shell pipeline example

March 4, 2019

In The output of Linux pipes can be indeterministic (via), Marek Gibney noticed that the following shell command has indeterminate output:

(echo red; echo green 1>&2) | echo blue

This can output any of "blue green" (with a newline between them), "green blue", or "blue"; the usual case is "blue green". Fully explaining this requires surprisingly arcane Unix knowledge.

The "blue green" and "green blue" outputs are simply a scheduling race. The 'echo green' and 'echo blue' are being run in separate processes, and which one of them gets executed first is up to the whims of the Unix scheduler. Because the left side of the pipeline has two things to do instead of one, often it will be the 'echo blue' process that wins the race.

The mysterious case is when the output is "blue" alone, and to explain this we need to know two pieces of Unix arcana. The first is our old friend SIGPIPE, where if a process writes to a closed pipe it normally receives a SIGPIPE signal and dies. The second is that 'echo' is a builtin command in shells today, and so the left side's 'echo red; echo green 1>&2' is actually all being handled by one process instead of the 'echo red' being its own separate process.

We get "blue" as the sole output when the 'echo blue' runs so soon that it exits, closing the pipeline, before the right left side can finish 'echo red'. When this happens the right left side gets a SIGPIPE and exits without running 'echo green' at all. This wouldn't happen if echo wasn't a specially handled builtin; if it was a separate command (or even if the shell forked to execute it internally), only the 'echo red' process would die from the SIGPIPE instead of the entire left side of the pipeline.

So we have three orders of execution:

  1. The shell on the left side gets through both of its echos before the 'echo blue' runs at all. The output is "green blue"

  2. The 'echo red' happens before 'echo blue' exits, so the left side doesn't get SIGPIPE, but 'echo green' happens afterwards. The output is "blue green".

  3. The 'echo blue' runs and exits, closing the pipe, before the 'echo red' finishes. The shell on the left side of the pipeline writes output into a closed pipe, gets SIGPIPE, and exits without going on to do the 'echo green'. The output is "blue".

The second order seems to be the most frequent in practice, although I'm sure it depends on a lot of things (including whether or not you're on an SMP system). One thing that may contribute to this is that I believe many shells start pipelines left to right, ie if you have a pipeline that looks like 'a | b | c | d', the main shell will fork the a process first, then the b process, and so on. All else being equal, this will give a an edge in running before d.

(This entry is adopted from my comment on lobste.rs, because why not.)

Written on 04 March 2019.
« Understanding a change often requires understanding how the code behaves
Using Prometheus subqueries to look for spikes in rates »

Page tools: View Source, Add Comment.
Search:
Login: Password:
Atom Syndication: Recent Comments.

Last modified: Mon Mar 4 23:55:34 2019
This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.