A Secure Bioinformatics Linux Lab in an Educational Research Environment
In delivering a new bioinformatics curriculum in the
Graduate School at the University of Medicine and Dentistry of New
Jersey, we undertook the challenge of incorporating new computational
resources over an existing research support infrastructure, adding new
services and platforms and reacting to an increasingly burdensome
responsibility to protect ourselves from network threats. Our new
environment spans two cities and links Linux workstations, Linux
servers, Silicon Graphics workstations, a Sun 6800 Enterprise Server and
the Internet. Open-source solutions combined with selective use of
commercial resources integrate in a cost-effective, service-friendly,
bioinformatics research environment. In this report, we describe
solutions to a set of challenges in our core, Linux-driven server/client
environment.
As with many universities, our public computer labs are
Microsoft boxes with the Office suite, and we have a set of clients--Web,
secure telnet, secure FTP, IMAP2 mail and X. The bioinformatics
software the university hosted lay behind these workstations, on
Sun/Solaris and SGI/Irix servers. We needed an environment in which we could do several
things: (1) manage workstations efficiently, (2) quickly add or delete
applications, (3) rebuild workstations, (4) ensure availability and
storage and (5) address network and data security issues.
We recognized that software and configuration information
should be stored in a centralized server and available to authenticated
clients. Our generic Web and e-mail servers already were overburdened
with these services. In addition, each of those servers faced
its own distinctive security threats and solutions. A better approach
would be to establish a separate server dedicated to serving the
scientific community, a scientific server. We needed to bring this
project in on a modest budget.
Almost any server/workstation environment dedicated to scientific
research might have offered multiple benefits,including parallel
processing, centralized administration and secure storage systems.
However, many fail in an important aspect in our two-city arena. We have
a high demand for visualization, and users need X server clients such as
ReflectionX, Exceed and Cygwin. X server clients display graphical
interfaces to users accessing programs on a server or an X client. Most
molecular modeling software requires visualization using OpenGL. In a
local area network, this kind of architecture should suffice. However,
our intercampus network was not always up to the task.
Our solution to this set of challenges was to build a bioinformatics
computer lab environment dedicated to teaching and research. This lab is
designed to be secure, resilient to attacks and failure and adaptable
to an array of software and modes of access by authenticated users.
We began with the operating system choice. We elected Linux, for many of
the usual reasons: open-source, secure, easily manageable and free
availability made it attractive in an educational environment with
limited funds. But in that economic mood, we still chose one step up,
selecting Red Hat Enterprise Linux due to the support that commercial
systems provide, including workstation monitoring, patches and upgrades
using the Red Hat Network. We went with Intel
x86 computers because we had a number on hand and they made good
economic sense. Plus, if we were to fail, we still would have boxes that otherwise
could be deployed.
As now deployed, our Piscataway lab has 14 Red Hat Enterprise Linux
workstations and two Enterprise Linux servers. In the Newark lab, (where
the facility is smaller), we have four Red Hat Enterprise Linux workstations
and one Enterprise Linux server. All Piscataway workstations are identical
in terms of hardware, as are all Newark workstations; there are minor
differences between the two sets, however.
Servers
We outlined a set of initial tasks: build a server to run
DHCP, host Red Hat CDs for Kickstart installations, authenticate users,
host users' home directories and provide a Web and database server. To
that end, we did an installation of Red Hat Enterprise Linux AS on two
separate PCs in Piscataway and one in Newark to act as our servers--one
primary, one backup and one Web/database server.
We also needed DHCP services to permit authorized users access to the
network with personal laptops and not to run our workstations. Accommodating
the laptop users, the MAC address of a user's laptop was
determined, and for each laptop we have an entry similar to the following,
where each host is identified by the user's username.
subnet 192.168.1.0 netmask 255.255.255.0 {
deny unknown-clients;
# DHCP range
range 192.168.1.240 192.168.1.245;
# known clients
host golharam { hardware ethernet 00:12:34:56:78:90; }
...
}
Using Kickstart
No academic lab is supportable if each workstation must be built and maintained separately.
Toward this end, we used the Kickstart feature of Red Hat, initially
installing Linux on a single machine to get an idea of what our
Kickstart configuration file would look like. We used that experience as
a starting point.
Using hardware that was slightly different between the
campuses, we recognized we would end up with multiple configuration
files. Managing several configuration files is not a big problem, but it
does introduce a point where the configurations can get out of sync when
multiple hardware flavors are not updated all at once. In order to
correct for this complication, we created a Perl script to read a
Kickstart template file and generate the necessary Kickstart
configuration files. The template we used is a Kickstart configuration
file with minor additions.
In our scripts, in any place where we had different options depending on
the host (such as using DHCP or assigning a static IP address), we
preceded the section with a colon (:) followed by the hostnames to which the
section applied. When a section was complete, it ended with a
colon-period (:.) combination. See the Resources section at the end for
a link to our Kickstart template.
You might notice we have three different options for the video card and
monitor resolution. One host, alanine, has a 15" flat-panel and uses the
Intel 845 board. Four other hosts--isoleucine, tryptophan, tyrosine and
proline--have 17" flat-panels using the Intel 845 board. Another group of
Piscataway machines have 17" flat-panels and carry the Intel 865 board.
We have several sections in our template file specifying, for example,
network settings, partition information, printer information and Red Hat
Network registration. Notice, however, that the packages section is
identical for all the machines. With our Perl script, mkkickstart, we
now can generate the configuration files that keep everything in sync. It
takes one parameter, the name of the machine to build for or all to
build for all the configurations. See the Resources section at the end
for a link to our Kickstart Perl script.
In order to make the installation-unattended installs, we put the
contents of the Enterprise Workstation CDs on an NFS share on the
server. We used the NFS share /products and copied the CDs into
/products/RedHat and the Kickstart configuration file into /products.
The benefit of using Kickstart is you can use one configuration
file to build a group of machines. This works well when the machines
obtain network information through DHCP. The downside of Kickstart occurs
if you need to specify a specific IP address; you need to
create a Kickstart configuration file for each machine. This leads to
using one Kickstart file per machine.
Our solution is to use a single Kickstart configuration file, in
which the machines obtain their networking information by way of DHCP. On the
DHCP server, we identified each machine by its MAC address and assigned
a static IP and hostname to that MAC address. This allows us to specify
static IP addresses using DHCP. A portion of our DHCP
configuration is shown below for one machine.
host hydrogen {
option host-name "hydrogen";
hardware ethernet 00:0D:56:0A:60:0B;
fixed-address 192.168.1.161;
}
In Newark where the machines are on a different subnet that has a DHCP
server we cannot configure, a Kickstart configuration file is provided
that contains the specific networking information for each machine.
In building a machine using Kickstart, we need to point Anaconda (the
Red Hat installer) to the location of the configuration file. Because they
reside on the server in an NFS share, Anaconda needs to be able to
access the server. In order to access the server, Anaconda needs the
necessary network drivers. When initially booting the machines from CD,
the network drivers are located on the CD and can be loaded. However,
part of the reason for using Kickstart and putting the CDs on the server
was so we would not need a CD.
Boot Disk
In order to circumvent the need for the CD, we created a
bootable floppy disk. An image of a boot floppy exists on the first RH CD,
so we used that to build a bootable floppy. Unfortunately, the network
drivers do not fit on the floppy. Our solution was to use USB
memory sticks.
Memory sticks are recognized as USB hard drives and typically can be
booted from. The memory sticks have a total capacity of 64MB each.
Recently, memory stick capacity has grown to 256MB and up. By booting
off a memory stick, we are able to fit the boot image from
/RedHat/isolinux from the first CD and add initrd.img from
/RedHat/images/pxeboot from the first CD. The pxeboot initrd.img
contains the necessary network drivers. The exact steps we used to build
a bootable memory stick are as follows. Many thanks to the folks on the
Red Hat mailing list for this
- 1. Format the USB stick as one big FAT partition: mkdosfs /dev/sdb1
- 2. Mount the memory stick
- 3. Copy everything from /RedHat/isolinux to the memory stick. You can
omit isolinux.bin, boot.cat and TRANS.TBL - 4. Rename isolinux.cfg as syslinux.cfg
- 5. Copy the initrd.img from /RedHat/images/pxeboot to the memory stick.
(At present I believe the two initrd.img files are the same) - 6. Unmount the memory stick
- 7. Make the memory stick bootable with syslinux /dev/sdb1
We now were able to boot from the memory stick.
Custom Products Install
Once the machines were built, a script in the
NFS share /products was run as root which performed an automated
installation of applications. This installed Perl modules, compiling and
installation applications and custom RPM files. It also included making
modifications to some of the configuration files to include other NFS
shares, firewall settings and setting environment variables.
User Accounts
We needed to establish user accounts and log in
protocols. We first used NIS but quickly grew unhappy with security
issues, principally because the port NIS uses could not be made static,
preventing us from establishing our firewall. One of the security
issues with NIS is NIS passwords are sent unencrypted over the
network, allowing anyone to capture passwords. We'd taken the NIS route
fearing that establishing an LDAP server would be a more time consuming
task. In the setup of NIS, the NIS HOW-TO proved to be useful in
getting things running.
The migration and setup of an LDAP server did not turn out to be as
difficult as first expected, as an established
migration path with Perl scripts already was available. For the most part, the
migration to LDAP went smoothly thanks to the LDAP HOW-TO.
The tools for adding, modifying and deleting users from LDAP exist, but they
are not as integrated as well as NIS is. For example, to add a user,
/usr/sbin/useradd needs to be called first to generate the necessary
user information and create the home directory. The information from
/etc/password and /etc/shadow needs to be extracted. Once done, the user
then could be added to LDAP. Thus, we created a Perl script to perform
this task for us. See the Resources section at the end for a link to our LDAP
useradd Perl script.
The content of that script was derived by massaging the migration
scripts that come with LDAP. In an NIS environment users would change passwords by running
passwd <username>.
LDAP has no equivalent command, so we created another script to perform
this function. See the Resources section at the end for a link to our
LDAP passwordchange Perl script.
We have yet to devise a mechanism for deleting users. The only way to do
this is to use one of the GUI tools available for LDAP, such as
DirectoryAdministrator or GQ. We look forward to having LDAP user
authentication integrated into Linux more thoroughly, as has been the
case for NIS and flat files.
NFS and Secure Storage
As mentioned previously, users' home directories
are mounted on the clients from the server. Initially, this directory
strategy worked well. We continue to voice some concerns, however, about network
reliability and security about mounting directories through our two
campuses network. We currently researching other distributed
filesystems to use for this purpose, including Lustre and InterMezzo.
A great advantage of a distributed filesystem would be to allow us to
reclaim unused disk space on the workstations. For example, in our
Piscataway lab, the workstations each have 120GB of hard drive space
and less than 20GB actually is being used for OS and applications. This
means that each machine is wasting 100GB of space, and with 14
machines, that is over 1TB of unused space. Ideally we would
like to make all that space look like one large drive available to all
the machines. Users' home directories then could be moved to this
distributed space, thus increasing the amount of space each user can use
and relaxing some of the responsibilities of the primary server.
Failover
One of the issues that arises with user accounts is backups and
general system reliability. Our failover scheme is pretty simple. Our
second server mirrors the first one. The servers sync home directories,
databases and certain configuration files nightly. If the main server
should go down, a script is available (through sudo access) to which a
number of senior people have access to execute. This script resets the
IP address of the machine to be the IP address of the main server and
starts various services such as DHCP, NFS and LDAP.
This design isn't meant for high-availability, but it does help to prevent
user data loss. In our environment, we encourage users to backup
critical data and archive it regularly. Increasingly, in our university,
user data backup becomes the responsibility of each user, with central
computing services concentrating only on disaster recovery services.
Security
Within university environments, access can be very open--as
much as a university can be. The majority of Windows OS machines on the
network are highly susceptible to all sorts of attacks in spite of
increasingly intense efforts to protect them. Infected machines create
network traffic, and we see performance dips because of that traffic. In
order to protect our lab from threats, a number of security measures are
in place, in addition to those described above.
To enhance security, access is controlled by associating
specific IP addresses to specific resources on the workstations and
servers. We control that access primarily through the use of iptables.
The server firewall allows incoming SSH traffic from anywhere. It then
performs IP address filtering to allow only certain IP addresses access
to more open resources, such as NFS, LDAP, CUPS and the FlexLM license
server. The Web server uses a slightly different setup to allow only
incoming SSH and HTTP traffic.
Each of the workstations has even more restrictive settings to allow
only incoming SSH and VNC traffic. Outgoing traffic on all the machines
is considered to be secure.
Each of the services allowed uses standard ports listed in /etc/services
with the exception of NFS and FlexLM. FlexLM port usage can be
controlled by modifying the license file by adding a specific port to
use. NFS is a little more troublesome. In order to get NFS set up to
use static ports, some background information is needed on how it works.
For a detailed description, you can refer to the LinWiz documentation (see Resources).
The LinWiz
site also includes a Web app that allows you to create an iptables
configuration you can use as a starting point for your firewall.
Future Directions
There is more to do. Our workstations are on two
campuses and our file and authentication server is on one. Our
intercampus network is less reliable than our LANs, and NFS traffic
between campuses is not encrypted. We intend to convert from the NFS
filesystem to a more network-reliable, distributed filesystem that
takes advantage of the extensive storage housed on our workstations and
in more efficiently communicating LANs.
Resources
Kickstart Template
Bruce Byrne is a PhD geneticist who learned that he needed
computers when he no longer could calculate the outcome of DNA cloning ventures
using scissors and highlighters. Bruce is the Associate Director for
Education at the Informatics Institute of UMDNJ, where he heads the
graduate school's Concentration in Bioinformatics.
John Kerrigan is a PhD chemist who learned that he could
leave test tubes in the laboratory and make his new molecules in silico.
John is the computational biologist at the University's Academic
Computing Service, teaches in the Concentration and collaborates with
research scientists interested particularly in rational drug design and
biophysics.
Ryan Golhar is a computer scientist who found an interest
in applying computer science to biology. Ryan currently is pursuing his
PhD at UMDNJ.










This week 5 lucky Members will receive a copy of The Official Ubuntu Server Book by Benjamin Mako Hill and Linux Journal's very own Kyle Rankin. No entry necessary. Check back here early next week to find out who the lucky Online Members are.




Comments
We recognized that software
We recognized that software and configuration information should be stored in a centralized server and available to authenticated clients. Our generic Web and e-mail servers already were overburdened with these services. In addition, each of those servers faced its own distinctive security threats and solutions. A better approach would be to establish a separate server dedicated to serving the scientific community, a scientific server. We needed to bring this project in on a modest budget. I think that this is clever idea. Budget is all it is about.
Tom
The server firewall allows
The server firewall allows incoming SSH traffic from anywhere. It then performs IP address filtering to allow only certain IP addresses access to more open resources, such as NFS, LDAP, CUPS and the FlexLM license server. The Web server uses a slightly different setup to allow only incoming SSH and HTTP traffic.
Re: A Secure Bioinformatics Linux Lab in an Educational Research
I am curious as to why you would add new users with the username and the password being the same, also why no minimum password expiration was given (possibly this was for the sake of the article, if not, then publishing the lab/machine names and the fact that default usernames are replicated for passwords would be twice as bad)?
Below is a simple suggestion for a perl subrouting which can be modified to your liking to generate semi-random passwords.
sub make_pass {
use String::Random;
$pass = new String::Random;
$pass=$pass->randpattern("CCnnccC"); #change this
print "New password is $pass
";
$pwd = (getpwuid($<))[1];
$salt = substr($pwd, 0, 2);
$salt = substr($pwd, 0, 2);
$newpass=crypt($pass, $salt);
print "New crypt is $newpass
";
}
Maybe I just overlooked where the users are forced to change their password on the initial login.
Good to see some more ink dealing with research institutions.
Phil M.
San Diego
passwd command on Linux works just fine with LDAP
We are also a bioinformatics lab, although a smaller one with a bit less teaching responsibilities. We moved from NIS to LDAP authentication about six months ago. The passwd command on modern Linuces knows how to deal with LDAP and can change passwords in a LDAP directory just fine. Users must have write access to their own passwords in the LDAP directory for this to work, but that is trivial to configure.
Scripting languages like Perl and Python can generate passwords encrypted in various ways, so I do not quite see why the change_password_perlscript invokes the 'passwd' command and uses a local /etc/shadow file from which the encrypted password is stripped. Seems like climbing to the tree backwards when the necessary LDIF could be generated in the script directly.
A nice graphical LDAP browser/editor named LUMA (project on Sourceforge) can do mass-creation of users and passwors. Unfortunately the working versions of LUMA depend on new versions of other packages, so getting it to run on anything but the latest distros can be an excercise.
Hostname "hydrogen"
I used to work at a place where the hostnames were element names, by atomic number -- if you knew your periodic table you didn't need the DNS server, which I think was lithium. Hydrogen was the gateway,
Post new comment