Bash: Redirecting Input from Multiple Files

Recently I needed to create a script that processed two input files. By processed I mean that the script needed to get a line from one file, then get a line from the second file, and then do something with them. Sounds easy enough, but it's not that easy unless you know about some of bash's extended redirection capabilities.

For the sake of this example, let's say that we want to implement a simple version of the paste command as a bash script. The paste command reads a line from each of its input files and then pastes them together and writes the combined result to stdout as a single line. Our example version will only do this for two input files. Plus it won't do any error checking and it will assume that the files contain the same number of lines.

Our input files, file1 and file2 are:

  $ cat file1
  f1 1
  f1 2
  f1 3
  f1 4
  $ cat file2
  f2 1
  f2 2
  f2 3
  f2 4

Your first thought might be something like this:

#!/bin/bash

while read f1 <$1
do
    read f2 <$2
    echo $f1 $f2
done
If you run this though you'll see it doesn't quite do the job:
  $ sh paste-bad.sh file1 file2
  f1 1 f2 1
  f1 1 f2 1
  f1 1 f2 1
  f1 1 f2 1
  f1 1 f2 1
  f1 1 f2 1
  f1 1 f2 1
  ...
  Ctrl-C
That's because each redirection here starts anew: it reopens the file and reads the first line and you get an endless loop.

Your next thought might be to read the files in one by one and then take the buffered data and paste it together afterwards:

#!/bin/bash

i=0
while read line
do
    f1[$i]="$line"
    let i++
done <$1

i=0
while read line
do
    f2[$i]="$line"
    let i++
done <$2

i=0
while [[ "${f1[$i]}" ]]
do
    echo ${f1[$i]} ${f2[$i]}
    let i++
done
And that works:
  $ sh paste-ok.sh file1 file2
  f1 1 f2 1
  f1 2 f2 2
  f1 3 f2 3
  f1 4 f2 4
But if you're trying to do something more complicated than pasting lines together that approach might not be feasible and in any case it's cumbersome.

The other solution, is to use some more advanced redirection:

#!/bin/bash

while read f1 <&7
do
    read f2 <&8
    echo $f1 $f2
done \
    7<$1 \
    8<$2

In this version, at the end of the loop we specify multiple input redirections using the full general form of bash's input redirection: [n]<word. If no leading [n] is specified the default is 0, which is normal stdin redirection. However, by specifying a small integer in front of a redirection we can redirect multiple input files to the command, in this case the command is the while loop:

  ...
  done \
        7<$1 \
        8<$2
This causes the "while" loop to execute with file descriptor 7 open for reading on the first input file and file descriptor 8 open for reading on the second input file. Normally, you should use a number larger than 2, as 0-2 are used for stdin, stdout, and stderr.

To make the read commands work we need to use a another form of bash's redirection, in this case we use bash's ability to duplicate a file descriptor (like the C library function dup2()). File descriptor duplication allows two file descriptors to refer to the same open file. Since read normally reads from stdin and not file descriptor 7 or 8 we need a way to duplicate file descriptor 7 (or 8) on stdin, bash's file descriptor duplication does just that:

  while read f1 <&7
  ...
      read f2 <&8
  ...
Note that read also includes a -u option for specifying the file descriptor to read from if you prefer.

Bash contains similar forms of redirection for output files as well. See the bash man page for more information.

Load Disqus comments

Corporate Patron

Linode Logo

 

Pulseway Logo

Limited Time Offer

September Cover

 

Take Linux Journal for a test drive. Download our September issue for FREE.

Topic of the Week

Cloud

The cloud has become synonymous with all things data storage. It additionally equates to the many web-centric services accessing that same back-end data storage, but the term also has evolved to mean so much more.