Investigating Some Unexpected Bash coproc Behavior

""

Recently while refreshing my memory on the use of Bash's coproc feature, I came across a reference to a pitfall that described what I thought was some quite unexpected behavior. This post describes my quick investigation of the pitfall and suggests a workaround (although I don't really recommend using it).

I came across the pitfall on the BashHackers wiki under the heading Avoid the final pipeline subshell. The example given on the wiki is:

#DOESN'T WORK
$ coproc ls
[1] 23232
$ while IFS= read -ru ${COPROC[0]} line; do printf '%s\n' "$line"; done
bash: read: line: invalid file descriptor specification
[1]+  Done                    coproc COPROC ls

What this attempts to do is run ls in the background (as a coprocess) and then to read from the coprocess' standard output and print the filenames' output from the ls command. But, as you can see, the read loop prints nothing and produces an error stating that an invalid file descriptor is being used. The example is also contrasted on the wiki to an example using ksh (the Korn Shell), which produces the output that one probably expects:

# ksh93 or mksh/pdksh derivatives
ls |& # start a coprocess
while IFS= read -rp file; do print -r -- "$file"; done # read its output
a.pdf
b.pdf
...

As you may have gathered, the Bash behavior seemed unexpected to me, and the Korn shell behavior is what I would have expected to happen. To try to figure this out a bit, I changed the code around by adding a read before the loop (which managed to get me one filename before the error message):

$ cat coproc1.sh
coproc ls *.pdf

IFS= read -ru ${COPROC[0]} line; printf '%s\n' "$line"
while IFS= read -ru ${COPROC[0]} line
do
    printf '%s\n' "$line"
done
$ bash coproc1.sh
a.pdf
coproc1.sh: line 4: read: line: invalid file descriptor specification

If you add additional read lines before the loop, you can often get additional filenames to print before you get the error. So it seemed like the pipe from the coprocess was getting closed before the script had finished reading its output. This still struck me as unexpected, so I checked the man pages for the pipe system call, pipe(2), and after that, the pipe overview man page, pipe(7), where I found the following paragraph:

If all file descriptors referring to the write end of a pipe have been closed, then an attempt to read(2) from the pipe will see end- of-file (read(2) will return 0). If all file descriptors referring to the read end of a pipe have been closed, then a write(2) will cause a SIGPIPE signal to be generated for the calling process. If the calling process is ignoring this signal, then write(2) fails with the error EPIPE.

This is when the light bulb finally came on: the ls command is writing all its output and exiting (that is, closing the pipe) before the Bash script has time to read all the contents of the pipe, which causes the read to fail at some point (depending on how fast the script runs).

This paragraph also gave me an idea of how I might work around the problem: if I duplicate the file descriptors returned by the coproc command, then the read wouldn't encounter the situation referred to above where all file descriptors referring to the write end of a pipe have been closed (and thereby causing subsequent reads to fail):

$ cat coproc2.sh
coproc ls *.pdf
exec 5<&${COPROC[0]} 6>&${COPROC[1]}
fd=5

IFS= read -ru $fd line; printf '%s\n' "$line"
while IFS= read -ru $fd line
do
    printf '%s\n' "$line"
done

exec 5<&- 6>&-
$ bash coproc2.sh
a.pdf
b.pdf
c.pdf
d.pdf
e.pdf
f.pdf
g.pdf

Now all the files get listed, and no error is produced.

Remember, duplicating file handles is done using exec redirections in Bash. The first exec duplicates the coprocess' file descriptors on to file descriptors 5 and 6. The last exec closes file descriptors 5 and 6.

Tip: Finding Out What Files Are Open From a Bash Script

Note that when duplicating file handles, it's often nice to see what files are open on what file descriptors. You can do this quite easily from a Bash script by adding the following command to your script at the point where you want to see the file descriptors that are open:

$ ls -la /proc/$$/fd
dr-x------ 2 mitch users  0 Aug 16 13:01 .
dr-xr-xr-x 9 mitch users  0 Aug 16 13:01 ..
lr-x------ 1 mitch users 64 Aug 16 13:01 255 -> .../script.sh
lr-x------ 1 mitch users 64 Aug 16 13:01 5 -> pipe:[73893]
l-wx------ 1 mitch users 64 Aug 16 13:01 6 -> pipe:[73894]
l-wx------ 1 mitch users 64 Aug 16 13:01 60 -> pipe:[73894]
lr-x------ 1 mitch users 64 Aug 16 13:01 63 -> pipe:[73893]

My first attempt at this approach only duplicated the file descriptor for the read end of the pipe (${COPROC[0]}), since I'm only reading from the pipe and not writing to it, that seemed like it would be sufficient, but that still failed. Duplicating both file descriptors allowed it to finish without error.

The main goal here was not to suggest a workaround for this Bash behavior, since this may not be a truly robust workaround. One can imagine that even this approach might fail if run on a fast enough system where the coprocess finishes and exits before the exec command has a chance to duplicate the file descriptors. So, addmittedly I haven't really come up with much that I can use everyday, but I have satisfied my curiosity as to why this unexpected behavior happens.

Load Disqus comments