Chemistry the Gromacs Way
In this article, I'm diving into chemistry again. Many packages, both commercial and open source, are available to make chemistry calculations at the quantum level. The one I cover here is gromacs (http://www.gromacs.org). It should be available for your distribution via its package manager.
The gromacs package is actually a full complement of small applications to take care of all of the steps from creating the initialization files, to doing the computational run, to doing analysis and visualization of the results. You also may notice more than one version available. For example, Ubuntu has both a single-processor version and an MPI version to do the calculations on a parallel machine. If you have a multicore machine or a cluster of machines at your disposal, this can help speed up your calculations quite a bit.
Before I start, here's a little background as to what types of calculations are possible and what methods gromacs uses. Ideally in computational chemistry, you should be able to take the Schrodinger's equations for the system in question and solve them completely. But, outside of very simple systems of only a few atoms, doing so becomes impossible rather quickly. This means some sort of approximation is required to get a result. In gromacs, the approximation method is molecular dynamics (MD). MD simulations solve Newton's equation for a set of interacting particles to get their trajectories. The total force on a particle from all of the other particles is calculated and then divided by the mass of the particle to get its acceleration. This is calculated for all of the particles, and every one moves according to its acceleration. Then, time is stepped one unit, and the whole thing is calculated again. You can increase the spatial and temporal resolution, at the cost of more computer time, until you get results that are accurate enough for your purposes.
MD simulations have several limitations. The first is that it is a classical simulation. In most cases, this is fine, but there are situations when you need to know the quantum effects of the atoms in your system. The second limitation is that electrons are in their ground state and are treated as if they move instantly to remain in orbit around their atoms. This means you can't model chemical reactions or any other electronic interactions. The next limitation is that long-range interactions are cut off. This becomes a serious issue when dealing with charged particles. The last limitation that I look at here is the fact that periodic boundary conditions are being used to try to simulate bulk systems. This, combined with the cut-off mentioned above, means you can end up with some unphysical results, especially during long runs.
Now that you have some background, how do you actually use gromacs? You need to start with the inputs: the initial position of all the atoms, the initial velocities and the interaction potential describing the forces between all of the atoms. The center of mass of the entire system is usually defined as having zero velocity, meaning there are no external forces on the system. Once this initial data is entered, gromacs needs to calculate the forces, apply these forces to each particle and calculate their new positions. Once they have moved, gromacs needs to recalculate the forces. This is done using the leapfrog algorithm. To get a better feel for what this looks like, let's consider a concrete example: a protein in water.
The first step is to come up with the initialization files needed to
do your calculation. This can be done completely from scratch, but in
many cases, it isn't necessary. In the case of proteins, there is the
Protein Data Bank (PDB), located at http://www.pdb.org. Make
sure the protein structure has the required detail you are looking for
in your simulation. You also can load it up in PyMOL and see what it
looks like. When you are happy with your selection, you can use it to
generate the required initialization files for gromacs with
pdb2gmx -f my_protein.pdb -water tip3p
tip3p is one of the water models that you can select. This
command generates several output files, the most important of which are
conf.gro, topol.top and posre.itp. At this point, you still haven't
added the water to the initialization files. To do so, you first need to
define a box to hold the water and the protein. To do that, you
would edit the configuration files with the following:
editconf -f conf.gro -bt dodecahedron -d 0.5 -o box.gro
This defines a box with the shape of a dodecahedron and a diameter of 0.5nm. You also can use boxes with cubic or octahedron shapes. Now that you have a box, you can add the water with the command:
genbox -cp box.gro -cs spc216.gro -p topol.top -o solvated.gro
This command takes the box (box.gro) and fills it with water molecules
as defined in the file spc216.gro. The
-p topol.top option adds this
water to the topology of the system. Finally, all of this is written
out to the file solvated.gro.
If you tried to run it now, you probably would run into issues because of large forces caused by the introduced water molecules. To deal with that, you can minimize the energy of the system before actually doing any calculations. You can do this by creating a parameter file to define how to do the minimization. For example:
------em.mdp------ integrator = steep nsteps = 200 nstlist = 10 rlist = 1.0 coulombtype = pme rcoulomb = 1.0 vdw-type = cut-off rvdw = 1.0 nstenergy = 10 ------------------
In this example, the minimization is being done by steepest-descent,
over 200 steps. You can look up the details of all the other options
in the gromacs documentation. With this finished, you can do all of the
necessary pre-processing with the
grompp -f em.mdp -p topol.top -c solvated.gro -o em.tpr
The actual minimization is handled through:
mdrun -v -deffnm em
em is used for all of the relevant filenames, with different
extensions. This makes it easier to do all of the pre-processing and
get the initialization steps completed on your desktop, then doing
the actual run on a supercomputer.
When you are ready to do the final run, you need to set up a parameter file describing the details. In this example, you could use something like this:
------run.mdp------ integrator = md nsteps = 5000 dt = 0.002 nstlist = 10 rlist = 1.0 coulombtype = pme rcoulomb = 1.0 vdw-type = cut-off rvdw = 1.0 tcoupl = Berendsen tc-grps = protein non-protein tau-t = 0.1 0.1 ref-t = 298 298 nstxout = 1000 nstvout = 1000 nstxtcout = 100 nstenergy = 100 ------------------
With this parameter file, you can do the pre-processing with:
grompp -f run.mdp -p topol.top -c pr.gro -o run.tpr
The actual MD calculation is done with the command:
mdrun -v -deffnm run
Now you can go off and drink a few days' worth of coffee. The actual runtime will depend on how many processors you have to throw at the problem.
Once this run is completed, you still need to analyze it to see what you can learn from it. You can compare the results to experimental results from X-ray structure measurements. You can measure the displacement of the heavy atoms from the X-ray structure with the g_rms program. You can analyze distance and hydrogen bonds with the g_dist and g_hbond programs. You can even make a movie with the trjconv program:
trjconv -s run.gro -f run.xtc -e 2500.0 -o movie.pdb
This will export 2.5ns to a movie of the trajectories. You then can view it using PyMOL.
This short article has provided only the faintest taste of what gromacs can do. I hope it sparks your interest in reading up on it and doing some new and interesting work.
Chemistry image via Shutterstock.com.
Joey Bernard has a background in both physics and computer science. This serves him well in his day job as a computational research consultant at the University of New Brunswick. He also teaches computational physics and parallel programming.
Practical Task Scheduling Deployment
One of the best things about the UNIX environment (aside from being stable and efficient) is the vast array of software tools available to help you do your job. Traditionally, a UNIX tool does only one thing, but does that one thing very well. For example, grep is very easy to use and can search vast amounts of data quickly. The find tool can find a particular file or files based on all kinds of criteria. It's pretty easy to string these tools together to build even more powerful tools, such as a tool that finds all of the .log files in the /home directory and searches each one for a particular entry. This erector-set mentality allows UNIX system administrators to seem to always have the right tool for the job.
Cron traditionally has been considered another such a tool for job scheduling, but is it enough? This webinar considers that very question. The first part builds on a previous Geek Guide, Beyond Cron, and briefly describes how to know when it might be time to consider upgrading your job scheduling infrastructure. The second part presents an actual planning and implementation framework.
Join Linux Journal's Mike Diehl and Pat Cameron of Help Systems.
Free to Linux Journal readers.View Now!
|The Firebird Project's Firebird Relational Database||Jul 29, 2016|
|Stunnel Security for Oracle||Jul 28, 2016|
|SUSE LLC's SUSE Manager||Jul 21, 2016|
|My +1 Sword of Productivity||Jul 20, 2016|
|Non-Linux FOSS: Caffeine!||Jul 19, 2016|
|Murat Yener and Onur Dundar's Expert Android Studio (Wrox)||Jul 18, 2016|
- The Firebird Project's Firebird Relational Database
- Stunnel Security for Oracle
- My +1 Sword of Productivity
- Non-Linux FOSS: Caffeine!
- Managing Linux Using Puppet
- SUSE LLC's SUSE Manager
- Murat Yener and Onur Dundar's Expert Android Studio (Wrox)
- Doing for User Space What We Did for Kernel Space
- Google's SwiftShader Released
- SuperTuxKart 0.9.2 Released
With all the industry talk about the benefits of Linux on Power and all the performance advantages offered by its open architecture, you may be considering a move in that direction. If you are thinking about analytics, big data and cloud computing, you would be right to evaluate Power. The idea of using commodity x86 hardware and replacing it every three years is an outdated cost model. It doesn’t consider the total cost of ownership, and it doesn’t consider the advantage of real processing power, high-availability and multithreading like a demon.
This ebook takes a look at some of the practical applications of the Linux on Power platform and ways you might bring all the performance power of this open architecture to bear for your organization. There are no smoke and mirrors here—just hard, cold, empirical evidence provided by independent sources. I also consider some innovative ways Linux on Power will be used in the future.Get the Guide