Bash: Redirecting Input from Multiple Files

 in

Recently I needed to create a script that processed two input files. By processed I mean that the script needed to get a line from one file, then get a line from the second file, and then do something with them. Sounds easy enough, but it's not that easy unless you know about some of bash's extended redirection capabilities.

For the sake of this example, let's say that we want to implement a simple version of the paste command as a bash script. The paste command reads a line from each of its input files and then pastes them together and writes the combined result to stdout as a single line. Our example version will only do this for two input files. Plus it won't do any error checking and it will assume that the files contain the same number of lines.

Our input files, file1 and file2 are:

  $ cat file1
  f1 1
  f1 2
  f1 3
  f1 4
  $ cat file2
  f2 1
  f2 2
  f2 3
  f2 4

Your first thought might be something like this:

#!/bin/bash

while read f1 <$1
do
    read f2 <$2
    echo $f1 $f2
done
If you run this though you'll see it doesn't quite do the job:
  $ sh paste-bad.sh file1 file2
  f1 1 f2 1
  f1 1 f2 1
  f1 1 f2 1
  f1 1 f2 1
  f1 1 f2 1
  f1 1 f2 1
  f1 1 f2 1
  ...
  Ctrl-C
That's because each redirection here starts anew: it reopens the file and reads the first line and you get an endless loop.

Your next thought might be to read the files in one by one and then take the buffered data and paste it together afterwards:

#!/bin/bash

i=0
while read line
do
    f1[$i]="$line"
    let i++
done <$1

i=0
while read line
do
    f2[$i]="$line"
    let i++
done <$2

i=0
while [[ "${f1[$i]}" ]]
do
    echo ${f1[$i]} ${f2[$i]}
    let i++
done
And that works:
  $ sh paste-ok.sh file1 file2
  f1 1 f2 1
  f1 2 f2 2
  f1 3 f2 3
  f1 4 f2 4
But if you're trying to do something more complicated than pasting lines together that approach might not be feasible and in any case it's cumbersome.

The other solution, is to use some more advanced redirection:

#!/bin/bash

while read f1 <&7
do
    read f2 <&8
    echo $f1 $f2
done \
    7<$1 \
    8<$2

In this version, at the end of the loop we specify multiple input redirections using the full general form of bash's input redirection: [n]<word. If no leading [n] is specified the default is 0, which is normal stdin redirection. However, by specifying a small integer in front of a redirection we can redirect multiple input files to the command, in this case the command is the while loop:

  ...
  done \
        7<$1 \
        8<$2
This causes the "while" loop to execute with file descriptor 7 open for reading on the first input file and file descriptor 8 open for reading on the second input file. Normally, you should use a number larger than 2, as 0-2 are used for stdin, stdout, and stderr.

To make the read commands work we need to use a another form of bash's redirection, in this case we use bash's ability to duplicate a file descriptor (like the C library function dup2()). File descriptor duplication allows two file descriptors to refer to the same open file. Since read normally reads from stdin and not file descriptor 7 or 8 we need a way to duplicate file descriptor 7 (or 8) on stdin, bash's file descriptor duplication does just that:

  while read f1 <&7
  ...
      read f2 <&8
  ...
Note that read also includes a -u option for specifying the file descriptor to read from if you prefer.

Bash contains similar forms of redirection for output files as well. See the bash man page for more information.

______________________

Mitch Frazier is an Associate Editor for Linux Journal.

Comments

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.

Helpful

Nath's picture

People are getting held up by the example used. Forget the example, that wasn't the point of the exercise. I googled "multiple text input files bash" for an entirely different purpose (Renaming a directory full of 1000+ files with new names with several modifications in text files).

This sample was perfect for my needs. I simply modified the example to what I was doing. Thanks to the author!

Call me ignorant but...

Anonymous's picture

Why just not use:

join file1 file2.... fileN

?

Quoting from the article....

Boscorama's picture

For the sake of this example, let's say that we want to implement a simple version of the paste command as a bash script.

'nuff said. :-)

exec is your friend

Boscorama's picture

Nice article. People don't use input redirect nearly as often
as they should.

You can also do it using exec if you wish the newly opened
descriptors to last for the life of the shell script. Also,
the 'while' condition below will correctly handle a 'short'
second file.


    exec 7<$1 8<$2

    while read f1 <&7 && read f2 <&8
    do
        # Good stuff goes here
    done

The exec command can also be used to 'dup' descriptors or swap
them. Say you want to preserve the current stdout but use another
destination for regular output in sub-commands without all that
pesky redirection:


    exec 7>&1 >some_other_file

    echo "This will go to some_other_file"
    # As will this
    dmesg

    echo "However, this will go to the old stdout" >&7

The same can be done with stdin & stderr (or any other open
file descriptor). :-)

Have fun.

while read && read

Anonymous's picture

+10

"while read && read" is a good tip since many forget that after "while" (and "if") can go basically anything bash accepts as a command. (Many get fooled by syntax sugar of '[' command.)

For many year I wonder what

cbm's picture

For many year I wonder what was used for the <& metacharacter!
Now I see that I could have used it many times instead of the awk to merge files.
Very interesting and clear explanation

Interesting ... but easy alternatives?

asturbcn's picture

Always interessing but, maybe it would be easy to use other tool like 'gawk'?

In the middle example, the

Russell2's picture

In the middle example, the syntax "while [[ "${f1[$i]}" ]]" will break-out if there is a blank line in file1.

Useful things to know. :-) Nice post. TU.

True

Mitch Frazier's picture

It will, but again this is just an example, not an attempt to be robust.

Mitch Frazier is an Associate Editor for Linux Journal.

White Paper
Linux Management with Red Hat Satellite: Measuring Business Impact and ROI

Linux has become a key foundation for supporting today's rapidly growing IT environments. Linux is being used to deploy business applications and databases, trading on its reputation as a low-cost operating environment. For many IT organizations, Linux is a mainstay for deploying Web servers and has evolved from handling basic file, print, and utility workloads to running mission-critical applications and databases, physically, virtually, and in the cloud. As Linux grows in importance in terms of value to the business, managing Linux environments to high standards of service quality — availability, security, and performance — becomes an essential requirement for business success.

Learn More

Sponsored by Red Hat

White Paper
Private PaaS for the Agile Enterprise

If you already use virtualized infrastructure, you are well on your way to leveraging the power of the cloud. Virtualization offers the promise of limitless resources, but how do you manage that scalability when your DevOps team doesn’t scale? In today’s hypercompetitive markets, fast results can make a difference between leading the pack vs. obsolescence. Organizations need more benefits from cloud computing than just raw resources. They need agility, flexibility, convenience, ROI, and control.

Stackato private Platform-as-a-Service technology from ActiveState extends your private cloud infrastructure by creating a private PaaS to provide on-demand availability, flexibility, control, and ultimately, faster time-to-market for your enterprise.

Learn More

Sponsored by ActiveState