Reading Multiple Files with Bash

Reading files is no big deal with bash: you just redirect the input to the script or pipe the output of another command into the script, or you could do it inside the script if the file names are pre-determined. You could also use process substitution to pass in the open files (command pipelines actually) from the command line. Another option, the one I describe here, is to just open the files and read (or write) them as you like, as you'd do in other programming languages.

The mechanism used here takes advantage of bash's ability to redirect input (or output) using a specific file descriptor with the following syntax:

n<file
n>file
n>>file
n<>file

The "n" here is a small integer that specifies the file descriptor to use to open the named file. If no "n" is specified then the following defaults apply:

<file           # same as 0<file
>file           # same as 1>file
>>file          # same as 1>>file
<>file          # same as 0<>file

This is of course the standard redirection stuff that is used all the time.

So, given that the "n" is there, it would seem that one could easily open files as needed and process them as needed. How to actually do it though is less than obvious, but it's actually quite simple:

exec 7<file1
exec 8<file2

This opens file1 on file descriptor 7 for input, and file2 on file descriptor 8. Now we can read them easily with:

read data1 <&7
read data2 <&8

Notice the input redirection to read uses another special form that includes the ampersand (&) to specify that what follows is a file descriptor and not a file name.

Use file descriptors in the range 3-9. File descriptors below 3 are used for standard input, output, and error, the ones above 9 may be used by the shell internally.

Although there is no explicit syntax for closing a file, re-using the file descriptor will close the file before opening the new file. (08/21/2009: this is incorrect, there is a syntax for closing files, see the comments below --Mitch)

To be safe you could do the following to close the files:

exec 7</dev/null
exec 8</dev/null

The reason for the exec is so that the opening of the file is done in the current shell and not in a sub-shell, which would close the file descriptor as soon as the command completed (not that it would be available in the calling shell anyways). It may also surprise you that n<file by itself is not a syntax error, but it's not.

An example of doing all this follows:

#!/bin/bash

function readfiles()
{
	local FD1=7
	local FD2=8
	local file1=$1
	local file2=$2
	local count1=0
	local count2=0
	local eof1=0
	local eof2=0
	local data1
	local data2

	# Open files.
	# ***** 08/22/2009: See comments below for a way to avoid    *****
	# *****             hardcoding the file descriptors -- Mitch *****
	exec 7<$file1
	exec 8<$file2

	while [[ $eof1 -eq 0  ||  $eof2 -eq 0 ]]
	do
		if read data1 <&$FD1; then
			let count1++
			printf "%s, line %d: %s\n" $file1 $count1 "$data1"
		else
			eof1=1
		fi
		if read data2 <&$FD2; then
			let count2++
			printf "%s, line %d: %s\n" $file2 $count2 "$data2"
		else
			eof2=1
		fi
	done
}

echo "Reading file1 and file2"
readfiles file1 file2

echo "Reading file3 and file4"
readfiles file3 file4


# vim: tabstop=4: shiftwidth=4: noexpandtab:
# kate: tab-width 4; indent-width 4; replace-tabs false;

The function at the top reads the files, the main code processes 2 files, then processes 2 different files. Running the command produces:

$ bash readmult.sh
Reading file1 and file2
file1, line 1: f1 line 1
file2, line 1: f2 line 1
file1, line 2: f1 line 2
file2, line 2: f2 line 2
file1, line 3: f1 line 3
file2, line 3: f2 line 3
file1, line 4: f1 line 4
file2, line 4: f2 line 4
file1, line 5: f1 line 5
file2, line 5: f2 line 5
file1, line 6: f1 line 6
Reading file3 and file4
file3, line 1: f3 line 1
file4, line 1: f4 line 1
file3, line 2: f3 line 2
file4, line 2: f4 line 2
file3, line 3: f3 line 3

A similar process can be used for writing multiple output files using the n>file or n>>file syntax. A possible time saver if you're writing a lot of data to the same file in many different places in your script.

Mitch Frazier is an embedded systems programmer at Emerson Electric Co. Mitch has been a contributor to and a friend of Linux Journal since the early 2000s.

Load Disqus comments