Reading Multiple Files with Bash
Reading files is no big deal with bash: you just redirect the input to the script or pipe the output of another command into the script, or you could do it inside the script if the file names are pre-determined. You could also use process substitution to pass in the open files (command pipelines actually) from the command line. Another option, the one I describe here, is to just open the files and read (or write) them as you like, as you'd do in other programming languages.
The mechanism used here takes advantage of bash's ability to redirect input (or output) using a specific file descriptor with the following syntax:
n<file
n>file
n>>file
n<>file
The "n" here is a small integer that specifies the file descriptor to use to open the named file. If no "n" is specified then the following defaults apply:
<file # same as 0<file
>file # same as 1>file
>>file # same as 1>>file
<>file # same as 0<>file
This is of course the standard redirection stuff that is used all the time.
So, given that the "n" is there, it would seem that one could easily open files as needed and process them as needed. How to actually do it though is less than obvious, but it's actually quite simple:
exec 7<file1
exec 8<file2
This opens file1 on file descriptor 7 for input, and file2 on file descriptor 8. Now we can read them easily with:
read data1 <&7
read data2 <&8
Notice the input redirection to read uses another special form that includes the ampersand (&) to specify that what follows is a file descriptor and not a file name.
Use file descriptors in the range 3-9. File descriptors below 3 are used for standard input, output, and error, the ones above 9 may be used by the shell internally.
Although there is no explicit syntax for closing a file, re-using the file descriptor will close the file before opening the new file. (08/21/2009: this is incorrect, there is a syntax for closing files, see the comments below --Mitch)
To be safe you could do the following to close the files:
exec 7</dev/null
exec 8</dev/null
The reason for the exec is so that the opening of the file is done in the current shell and not in a sub-shell, which would close the file descriptor as soon as the command completed (not that it would be available in the calling shell anyways). It may also surprise you that n<file by itself is not a syntax error, but it's not.
An example of doing all this follows:
#!/bin/bash
function readfiles()
{
local FD1=7
local FD2=8
local file1=$1
local file2=$2
local count1=0
local count2=0
local eof1=0
local eof2=0
local data1
local data2
# Open files.
# ***** 08/22/2009: See comments below for a way to avoid *****
# ***** hardcoding the file descriptors -- Mitch *****
exec 7<$file1
exec 8<$file2
while [[ $eof1 -eq 0 || $eof2 -eq 0 ]]
do
if read data1 <&$FD1; then
let count1++
printf "%s, line %d: %s\n" $file1 $count1 "$data1"
else
eof1=1
fi
if read data2 <&$FD2; then
let count2++
printf "%s, line %d: %s\n" $file2 $count2 "$data2"
else
eof2=1
fi
done
}
echo "Reading file1 and file2"
readfiles file1 file2
echo "Reading file3 and file4"
readfiles file3 file4
# vim: tabstop=4: shiftwidth=4: noexpandtab:
# kate: tab-width 4; indent-width 4; replace-tabs false;
The function at the top reads the files, the main code processes 2 files, then processes 2 different files. Running the command produces:
$ bash readmult.sh
Reading file1 and file2
file1, line 1: f1 line 1
file2, line 1: f2 line 1
file1, line 2: f1 line 2
file2, line 2: f2 line 2
file1, line 3: f1 line 3
file2, line 3: f2 line 3
file1, line 4: f1 line 4
file2, line 4: f2 line 4
file1, line 5: f1 line 5
file2, line 5: f2 line 5
file1, line 6: f1 line 6
Reading file3 and file4
file3, line 1: f3 line 1
file4, line 1: f4 line 1
file3, line 2: f3 line 2
file4, line 2: f4 line 2
file3, line 3: f3 line 3
A similar process can be used for writing multiple output files using the n>file or n>>file syntax. A possible time saver if you're writing a lot of data to the same file in many different places in your script.
| Attachment | Size |
|---|---|
| readmult.tgz | 634 bytes |
Mitch Frazier is an Associate Editor for Linux Journal.
Realizing the promise of Apache® Hadoop® requires the effective deployment of compute, memory, storage and networking to achieve optimal results. With its flexibility and multitude of options, it is easy to over or under provision the server infrastructure, resulting in poor performance and high TCO. Join us for an in depth, technical discussion with industry experts from leading Hadoop and server companies who will provide insights into the key considerations for designing and deploying an optimal Hadoop cluster.
Sponsored by AMD
If you already use virtualized infrastructure, you are well on your way to leveraging the power of the cloud. Virtualization offers the promise of limitless resources, but how do you manage that scalability when your DevOps team doesn’t scale? In today’s hypercompetitive markets, fast results can make a difference between leading the pack vs. obsolescence. Organizations need more benefits from cloud computing than just raw resources. They need agility, flexibility, convenience, ROI, and control.
Stackato private Platform-as-a-Service technology from ActiveState extends your private cloud infrastructure by creating a private PaaS to provide on-demand availability, flexibility, control, and ultimately, faster time-to-market for your enterprise.
Sponsored by ActiveState
| Speed Up Your Web Site with Varnish | Jun 19, 2013 |
| Non-Linux FOSS: libnotify, OS X Style | Jun 18, 2013 |
| Containers—Not Virtual Machines—Are the Future Cloud | Jun 17, 2013 |
| Lock-Free Multi-Producer Multi-Consumer Queue on Ring Buffer | Jun 12, 2013 |
| Weechat, Irssi's Little Brother | Jun 11, 2013 |
| One Tail Just Isn't Enough | Jun 07, 2013 |
- Speed Up Your Web Site with Varnish
- Containers—Not Virtual Machines—Are the Future Cloud
- Linux Systems Administrator
- Senior Perl Developer
- Lock-Free Multi-Producer Multi-Consumer Queue on Ring Buffer
- Technical Support Rep
- RSS Feeds
- Non-Linux FOSS: libnotify, OS X Style
- UX Designer
- Web & UI Developer (JavaScript & j Query)
- So when they found it hard to
1 hour 45 min ago - yea
2 hours 7 min ago - Reply to comment | Linux Journal
2 hours 29 min ago - Android has been dominating
2 hours 34 min ago - It is quiet helping
5 hours 20 min ago - Technology
5 hours 37 min ago - Reachli - Amplifying your
6 hours 53 min ago - excellent
7 hours 42 min ago - good point!
7 hours 45 min ago - Varnish works!
7 hours 54 min ago
Featured Jobs
| Linux Systems Administrator | Houston and Austin, Texas | Host Gator |
| Senior Perl Developer | Austin, Texas | Host Gator |
| Technical Support Rep | Houston and Austin, Texas | Host Gator |
| UX Designer | Austin, Texas | Host Gator |
| Web & UI Developer (JavaScript & j Query) | Austin, Texas | Host Gator |
Free Webinar: Hadoop
How to Build an Optimal Hadoop Cluster to Store and Maintain Unlimited Amounts of Data Using Microservers
Realizing the promise of Apache® Hadoop® requires the effective deployment of compute, memory, storage and networking to achieve optimal results. With its flexibility and multitude of options, it is easy to over or under provision the server infrastructure, resulting in poor performance and high TCO. Join us for an in depth, technical discussion with industry experts from leading Hadoop and server companies who will provide insights into the key considerations for designing and deploying an optimal Hadoop cluster.
Some of key questions to be discussed are:
- What is the “typical” Hadoop cluster and what should be installed on the different machine types?
- Why should you consider the typical workload patterns when making your hardware decisions?
- Are all microservers created equal for Hadoop deployments?
- How do I plan for expansion if I require more compute, memory, storage or networking?



Comments
No mention in the manual? Huh?
You are aware that the documentation for Bash is in Texinfo?
And even then, the manual page says:
Duplicating File Descriptors
The redirection operator
[n]<&word
is used to duplicate input file descriptors. If word expands to one or
more digits, the file descriptor denoted by n is made to be a copy of
that file descriptor. If the digits in word do not specify a file
descriptor open for input, a redirection error occurs. If word evalu‐
ates to -, file descriptor n is closed.
Thanks
Guess I missed that part. Now that I check the man page a bit closer, I see that it is in there. Although it doesn't explicitly state that "[n]>&-" closes "n", which is what the comment below referred to, although that does appear to work also.
Mitch Frazier is an Associate Editor for Linux Journal.
From the Man Page
As a reference, the man page describes what exec is doing:
Mitch Frazier is an Associate Editor for Linux Journal.
exec syntax
Could you have written:
exec 7<$file1
as:
exec $FD1<$file1
?
Why would you hard-code the file descriptor value in the exec line when it was already defined in a variable? Does the exec command not like that for some reason?
It Doesn't Like It
Unfortunately that doesn't work, you get:
Exec does the substitution of the value of $FD1 but it doesn't then reparse that to see if it's an integer, rather it assumes it's a command. The message is saying that the command 7 is not found.
Mitch Frazier is an Associate Editor for Linux Journal.
Just As I Wrote That...
Just as I hit submit I realized how to make that work:
# Open files. eval exec "$FD1<$file1" eval exec "$FD2<$file2"Mitch Frazier is an Associate Editor for Linux Journal.
Benchmarks
Comparing the use of file descriptors to standard redirection:
http://www.los-gatos.ca.us/davidbu/faster_sh.html
Closing file descriptors
There is an explicit syntax for closing a file descriptor. If you want to close descriptor 7:
exec 7>&-I recommend avoiding this with descriptors 0-2, since many programs will behave erratically if run with these descriptors closed.
Interesting
Thanks. That does appear to do what you describe. You can test it with the script:
Which should produce something like:As you can see, in the second output from lsof the file on file descriptor 7 is now closed.
Most interesting about this is that it doesn't appear to be in the man page anywhere, closest thing I see is:
So I guess it's a special form of that. If you check back, leave a note as to where you found that documented. Thanks again.
Mitch Frazier is an Associate Editor for Linux Journal.
It's also more efficient
Just an additional note... Using file descriptors is also more efficient (reducing processing time by about 5x). This is because the file is not opened/closed implicitly between operations.