Chemistry the Gromacs Way
In this article, I'm diving into chemistry again. Many packages, both commercial and open source, are available to make chemistry calculations at the quantum level. The one I cover here is gromacs (http://www.gromacs.org). It should be available for your distribution via its package manager.
The gromacs package is actually a full complement of small applications to take care of all of the steps from creating the initialization files, to doing the computational run, to doing analysis and visualization of the results. You also may notice more than one version available. For example, Ubuntu has both a single-processor version and an MPI version to do the calculations on a parallel machine. If you have a multicore machine or a cluster of machines at your disposal, this can help speed up your calculations quite a bit.
Before I start, here's a little background as to what types of calculations are possible and what methods gromacs uses. Ideally in computational chemistry, you should be able to take the Schrodinger's equations for the system in question and solve them completely. But, outside of very simple systems of only a few atoms, doing so becomes impossible rather quickly. This means some sort of approximation is required to get a result. In gromacs, the approximation method is molecular dynamics (MD). MD simulations solve Newton's equation for a set of interacting particles to get their trajectories. The total force on a particle from all of the other particles is calculated and then divided by the mass of the particle to get its acceleration. This is calculated for all of the particles, and every one moves according to its acceleration. Then, time is stepped one unit, and the whole thing is calculated again. You can increase the spatial and temporal resolution, at the cost of more computer time, until you get results that are accurate enough for your purposes.
MD simulations have several limitations. The first is that it is a classical simulation. In most cases, this is fine, but there are situations when you need to know the quantum effects of the atoms in your system. The second limitation is that electrons are in their ground state and are treated as if they move instantly to remain in orbit around their atoms. This means you can't model chemical reactions or any other electronic interactions. The next limitation is that long-range interactions are cut off. This becomes a serious issue when dealing with charged particles. The last limitation that I look at here is the fact that periodic boundary conditions are being used to try to simulate bulk systems. This, combined with the cut-off mentioned above, means you can end up with some unphysical results, especially during long runs.
Now that you have some background, how do you actually use gromacs? You need to start with the inputs: the initial position of all the atoms, the initial velocities and the interaction potential describing the forces between all of the atoms. The center of mass of the entire system is usually defined as having zero velocity, meaning there are no external forces on the system. Once this initial data is entered, gromacs needs to calculate the forces, apply these forces to each particle and calculate their new positions. Once they have moved, gromacs needs to recalculate the forces. This is done using the leapfrog algorithm. To get a better feel for what this looks like, let's consider a concrete example: a protein in water.
The first step is to come up with the initialization files needed to
do your calculation. This can be done completely from scratch, but in
many cases, it isn't necessary. In the case of proteins, there is the
Protein Data Bank (PDB), located at http://www.pdb.org. Make
sure the protein structure has the required detail you are looking for
in your simulation. You also can load it up in PyMOL and see what it
looks like. When you are happy with your selection, you can use it to
generate the required initialization files for gromacs with
pdb2gmx:
pdb2gmx -f my_protein.pdb -water tip3p
where tip3p is one of the water models that you can select. This
command generates several output files, the most important of which are
conf.gro, topol.top and posre.itp. At this point, you still haven't
added the water to the initialization files. To do so, you first need to
define a box to hold the water and the protein. To do that, you
would edit the configuration files with the following:
editconf -f conf.gro -bt dodecahedron -d 0.5 -o box.gro
This defines a box with the shape of a dodecahedron and a diameter of 0.5nm. You also can use boxes with cubic or octahedron shapes. Now that you have a box, you can add the water with the command:
genbox -cp box.gro -cs spc216.gro -p topol.top -o solvated.gro
This command takes the box (box.gro) and fills it with water molecules
as defined in the file spc216.gro. The -p topol.top option adds this
water to the topology of the system. Finally, all of this is written
out to the file solvated.gro.
If you tried to run it now, you probably would run into issues because of large forces caused by the introduced water molecules. To deal with that, you can minimize the energy of the system before actually doing any calculations. You can do this by creating a parameter file to define how to do the minimization. For example:
------em.mdp------
integrator = steep
nsteps = 200
nstlist = 10
rlist = 1.0
coulombtype = pme
rcoulomb = 1.0
vdw-type = cut-off
rvdw = 1.0
nstenergy = 10
------------------
In this example, the minimization is being done by steepest-descent,
over 200 steps. You can look up the details of all the other options
in the gromacs documentation. With this finished, you can do all of the
necessary pre-processing with the grompp
command:
grompp -f em.mdp -p topol.top -c solvated.gro -o em.tpr
The actual minimization is handled through:
mdrun -v -deffnm em
The prefix em is used for all of the relevant filenames, with different
extensions. This makes it easier to do all of the pre-processing and
get the initialization steps completed on your desktop, then doing
the actual run on a supercomputer.
When you are ready to do the final run, you need to set up a parameter file describing the details. In this example, you could use something like this:
------run.mdp------
integrator = md
nsteps = 5000
dt = 0.002
nstlist = 10
rlist = 1.0
coulombtype = pme
rcoulomb = 1.0
vdw-type = cut-off
rvdw = 1.0
tcoupl = Berendsen
tc-grps = protein non-protein
tau-t = 0.1 0.1
ref-t = 298 298
nstxout = 1000
nstvout = 1000
nstxtcout = 100
nstenergy = 100
------------------
With this parameter file, you can do the pre-processing with:
grompp -f run.mdp -p topol.top -c pr.gro -o run.tpr
The actual MD calculation is done with the command:
mdrun -v -deffnm run
Now you can go off and drink a few days' worth of coffee. The actual runtime will depend on how many processors you have to throw at the problem.
Once this run is completed, you still need to analyze it to see what you can learn from it. You can compare the results to experimental results from X-ray structure measurements. You can measure the displacement of the heavy atoms from the X-ray structure with the g_rms program. You can analyze distance and hydrogen bonds with the g_dist and g_hbond programs. You can even make a movie with the trjconv program:
trjconv -s run.gro -f run.xtc -e 2500.0 -o movie.pdb
This will export 2.5ns to a movie of the trajectories. You then can view it using PyMOL.
This short article has provided only the faintest taste of what gromacs can do. I hope it sparks your interest in reading up on it and doing some new and interesting work.
Chemistry image via Shutterstock.com.
Joey Bernard has a background in both physics and computer science. This serves him well in his day job as a computational research consultant at the University of New Brunswick. He also teaches computational physics and parallel programming.
Realizing the promise of Apache® Hadoop® requires the effective deployment of compute, memory, storage and networking to achieve optimal results. With its flexibility and multitude of options, it is easy to over or under provision the server infrastructure, resulting in poor performance and high TCO. Join us for an in depth, technical discussion with industry experts from leading Hadoop and server companies who will provide insights into the key considerations for designing and deploying an optimal Hadoop cluster.
Sponsored by AMD
Built-in forensics, incident response, and security with Red Hat Enterprise Linux 6
Every security policy provides guidance and requirements for ensuring adequate protection of information and data, as well as high-level technical and administrative security requirements for a system in a given environment. Traditionally, providing security for a system focuses on the confidentiality of the information on it. However, protecting the data integrity and system and data availability is just as important. For example, when processing United States intelligence information, there are three attributes that require protection: confidentiality, integrity, and availability.
Learn more about catching the bad guy in this free white paper.
Sponsored by DLT Solutions
| Designing Electronics with Linux | May 22, 2013 |
| Dynamic DNS—an Object Lesson in Problem Solving | May 21, 2013 |
| Using Salt Stack and Vagrant for Drupal Development | May 20, 2013 |
| Making Linux and Android Get Along (It's Not as Hard as It Sounds) | May 16, 2013 |
| Drupal Is a Framework: Why Everyone Needs to Understand This | May 15, 2013 |
| Home, My Backup Data Center | May 13, 2013 |
- New Products
- Linux Systems Administrator
- Senior Perl Developer
- Technical Support Rep
- UX Designer
- Web & UI Developer (JavaScript & j Query)
- Designing Electronics with Linux
- Dynamic DNS—an Object Lesson in Problem Solving
- Making Linux and Android Get Along (It's Not as Hard as It Sounds)
- Using Salt Stack and Vagrant for Drupal Development
- Reply to comment | Linux Journal
3 hours 56 min ago - Nice article, thanks for the
14 hours 37 min ago - I once had a better way I
20 hours 23 min ago - Not only you I too assumed
20 hours 40 min ago - another very interesting
22 hours 33 min ago - Reply to comment | Linux Journal
1 day 27 min ago - Reply to comment | Linux Journal
1 day 7 hours ago - Reply to comment | Linux Journal
1 day 7 hours ago - Favorite (and easily brute-forced) pw's
1 day 9 hours ago - Have you tried Boxen? It's a
1 day 15 hours ago
Enter to Win an Adafruit Pi Cobbler Breakout Kit for Raspberry Pi

It's Raspberry Pi month at Linux Journal. Each week in May, Adafruit will be giving away a Pi-related prize to a lucky, randomly drawn LJ reader. Winners will be announced weekly.
Fill out the fields below to enter to win this week's prize-- a Pi Cobbler Breakout Kit for Raspberry Pi.
Congratulations to our winners so far:
- 5-8-13, Pi Starter Pack: Jack Davis
- 5-15-13, Pi Model B 512MB RAM: Patrick Dunn
- 5-21-13, Prototyping Pi Plate Kit: Philip Kirby
- Next winner announced on 5-27-13!
Featured Jobs
| Linux Systems Administrator | Houston and Austin, Texas | Host Gator |
| Senior Perl Developer | Austin, Texas | Host Gator |
| Technical Support Rep | Houston and Austin, Texas | Host Gator |
| UX Designer | Austin, Texas | Host Gator |
| Web & UI Developer (JavaScript & j Query) | Austin, Texas | Host Gator |
Free Webinar: Hadoop
How to Build an Optimal Hadoop Cluster to Store and Maintain Unlimited Amounts of Data Using Microservers
Realizing the promise of Apache® Hadoop® requires the effective deployment of compute, memory, storage and networking to achieve optimal results. With its flexibility and multitude of options, it is easy to over or under provision the server infrastructure, resulting in poor performance and high TCO. Join us for an in depth, technical discussion with industry experts from leading Hadoop and server companies who will provide insights into the key considerations for designing and deploying an optimal Hadoop cluster.
Some of key questions to be discussed are:
- What is the “typical” Hadoop cluster and what should be installed on the different machine types?
- Why should you consider the typical workload patterns when making your hardware decisions?
- Are all microservers created equal for Hadoop deployments?
- How do I plan for expansion if I require more compute, memory, storage or networking?



Comments
Reply to comment | Linux Journal
If some one wants to be updated with most recent technologies then he must be go to see this web site and be
up to date all the time.