Chemistry the Gromacs Way
In this article, I'm diving into chemistry again. Many packages, both commercial and open source, are available to make chemistry calculations at the quantum level. The one I cover here is gromacs (http://www.gromacs.org). It should be available for your distribution via its package manager.
The gromacs package is actually a full complement of small applications to take care of all of the steps from creating the initialization files, to doing the computational run, to doing analysis and visualization of the results. You also may notice more than one version available. For example, Ubuntu has both a single-processor version and an MPI version to do the calculations on a parallel machine. If you have a multicore machine or a cluster of machines at your disposal, this can help speed up your calculations quite a bit.
Before I start, here's a little background as to what types of calculations are possible and what methods gromacs uses. Ideally in computational chemistry, you should be able to take the Schrodinger's equations for the system in question and solve them completely. But, outside of very simple systems of only a few atoms, doing so becomes impossible rather quickly. This means some sort of approximation is required to get a result. In gromacs, the approximation method is molecular dynamics (MD). MD simulations solve Newton's equation for a set of interacting particles to get their trajectories. The total force on a particle from all of the other particles is calculated and then divided by the mass of the particle to get its acceleration. This is calculated for all of the particles, and every one moves according to its acceleration. Then, time is stepped one unit, and the whole thing is calculated again. You can increase the spatial and temporal resolution, at the cost of more computer time, until you get results that are accurate enough for your purposes.
MD simulations have several limitations. The first is that it is a classical simulation. In most cases, this is fine, but there are situations when you need to know the quantum effects of the atoms in your system. The second limitation is that electrons are in their ground state and are treated as if they move instantly to remain in orbit around their atoms. This means you can't model chemical reactions or any other electronic interactions. The next limitation is that long-range interactions are cut off. This becomes a serious issue when dealing with charged particles. The last limitation that I look at here is the fact that periodic boundary conditions are being used to try to simulate bulk systems. This, combined with the cut-off mentioned above, means you can end up with some unphysical results, especially during long runs.
Now that you have some background, how do you actually use gromacs? You need to start with the inputs: the initial position of all the atoms, the initial velocities and the interaction potential describing the forces between all of the atoms. The center of mass of the entire system is usually defined as having zero velocity, meaning there are no external forces on the system. Once this initial data is entered, gromacs needs to calculate the forces, apply these forces to each particle and calculate their new positions. Once they have moved, gromacs needs to recalculate the forces. This is done using the leapfrog algorithm. To get a better feel for what this looks like, let's consider a concrete example: a protein in water.
The first step is to come up with the initialization files needed to
do your calculation. This can be done completely from scratch, but in
many cases, it isn't necessary. In the case of proteins, there is the
Protein Data Bank (PDB), located at http://www.pdb.org. Make
sure the protein structure has the required detail you are looking for
in your simulation. You also can load it up in PyMOL and see what it
looks like. When you are happy with your selection, you can use it to
generate the required initialization files for gromacs with
pdb2gmx:
pdb2gmx -f my_protein.pdb -water tip3p
where tip3p is one of the water models that you can select. This
command generates several output files, the most important of which are
conf.gro, topol.top and posre.itp. At this point, you still haven't
added the water to the initialization files. To do so, you first need to
define a box to hold the water and the protein. To do that, you
would edit the configuration files with the following:
editconf -f conf.gro -bt dodecahedron -d 0.5 -o box.gro
This defines a box with the shape of a dodecahedron and a diameter of 0.5nm. You also can use boxes with cubic or octahedron shapes. Now that you have a box, you can add the water with the command:
genbox -cp box.gro -cs spc216.gro -p topol.top -o solvated.gro
This command takes the box (box.gro) and fills it with water molecules
as defined in the file spc216.gro. The -p topol.top option adds this
water to the topology of the system. Finally, all of this is written
out to the file solvated.gro.
If you tried to run it now, you probably would run into issues because of large forces caused by the introduced water molecules. To deal with that, you can minimize the energy of the system before actually doing any calculations. You can do this by creating a parameter file to define how to do the minimization. For example:
------em.mdp------
integrator = steep
nsteps = 200
nstlist = 10
rlist = 1.0
coulombtype = pme
rcoulomb = 1.0
vdw-type = cut-off
rvdw = 1.0
nstenergy = 10
------------------
In this example, the minimization is being done by steepest-descent,
over 200 steps. You can look up the details of all the other options
in the gromacs documentation. With this finished, you can do all of the
necessary pre-processing with the grompp
command:
grompp -f em.mdp -p topol.top -c solvated.gro -o em.tpr
The actual minimization is handled through:
mdrun -v -deffnm em
The prefix em is used for all of the relevant filenames, with different
extensions. This makes it easier to do all of the pre-processing and
get the initialization steps completed on your desktop, then doing
the actual run on a supercomputer.
When you are ready to do the final run, you need to set up a parameter file describing the details. In this example, you could use something like this:
------run.mdp------
integrator = md
nsteps = 5000
dt = 0.002
nstlist = 10
rlist = 1.0
coulombtype = pme
rcoulomb = 1.0
vdw-type = cut-off
rvdw = 1.0
tcoupl = Berendsen
tc-grps = protein non-protein
tau-t = 0.1 0.1
ref-t = 298 298
nstxout = 1000
nstvout = 1000
nstxtcout = 100
nstenergy = 100
------------------
With this parameter file, you can do the pre-processing with:
grompp -f run.mdp -p topol.top -c pr.gro -o run.tpr
The actual MD calculation is done with the command:
mdrun -v -deffnm run
Now you can go off and drink a few days' worth of coffee. The actual runtime will depend on how many processors you have to throw at the problem.
Once this run is completed, you still need to analyze it to see what you can learn from it. You can compare the results to experimental results from X-ray structure measurements. You can measure the displacement of the heavy atoms from the X-ray structure with the g_rms program. You can analyze distance and hydrogen bonds with the g_dist and g_hbond programs. You can even make a movie with the trjconv program:
trjconv -s run.gro -f run.xtc -e 2500.0 -o movie.pdb
This will export 2.5ns to a movie of the trajectories. You then can view it using PyMOL.
This short article has provided only the faintest taste of what gromacs can do. I hope it sparks your interest in reading up on it and doing some new and interesting work.
Chemistry image via Shutterstock.com.
Joey Bernard has a background in both physics and computer science. This serves him well in his day job as a computational research consultant at the University of New Brunswick. He also teaches computational physics and parallel programming.
Today’s modular x86 servers are compute-centric, designed as a least common denominator to support a wide range of IT workloads. Those generic, virtualized IT workloads have much different resource optimization requirements than hyperscale and cloud applications. They have resulted in a “one size fits all” enterprise IT architecture that is not optimized for a specific set of IT workloads, and especially not emerging hyperscale workloads, such as web applications, big data, and object storage. In this report, you will learn how shifting the focus from traditional compute-centric IT architectures to an innovative disaggregated fabric-based architecture can optimize and scale your data center.
Sponsored by AMD
Built-in forensics, incident response, and security with Red Hat Enterprise Linux 6
Every security policy provides guidance and requirements for ensuring adequate protection of information and data, as well as high-level technical and administrative security requirements for a system in a given environment. Traditionally, providing security for a system focuses on the confidentiality of the information on it. However, protecting the data integrity and system and data availability is just as important. For example, when processing United States intelligence information, there are three attributes that require protection: confidentiality, integrity, and availability.
Learn more about catching the bad guy in this free white paper.
Sponsored by DLT Solutions
| Making Linux and Android Get Along (It's Not as Hard as It Sounds) | May 16, 2013 |
| Drupal Is a Framework: Why Everyone Needs to Understand This | May 15, 2013 |
| Home, My Backup Data Center | May 13, 2013 |
| Non-Linux FOSS: Seashore | May 10, 2013 |
| Trying to Tame the Tablet | May 08, 2013 |
| Dart: a New Web Programming Experience | May 07, 2013 |
- New Products
- Making Linux and Android Get Along (It's Not as Hard as It Sounds)
- Drupal Is a Framework: Why Everyone Needs to Understand This
- A Topic for Discussion - Open Source Feature-Richness?
- Home, My Backup Data Center
- RSS Feeds
- Trying to Tame the Tablet
- New Products
- What's the tweeting protocol?
- Dart: a New Web Programming Experience
- Hey God - You may not be
16 min 18 sec ago - Reply to comment | Linux Journal
2 hours 48 min ago - Drupal is an Awesome CMS and a Crappy development framework
7 hours 28 min ago - IT industry leaders
9 hours 50 min ago - Reply to comment | Linux Journal
1 day 2 hours ago - Reply to comment | Linux Journal
1 day 5 hours ago - Reply to comment | Linux Journal
1 day 6 hours ago - great post
1 day 7 hours ago - Google Docs
1 day 7 hours ago - Reply to comment | Linux Journal
1 day 12 hours ago
Enter to Win an Adafruit Prototyping Pi Plate Kit for Raspberry Pi

It's Raspberry Pi month at Linux Journal. Each week in May, Adafruit will be giving away a Pi-related prize to a lucky, randomly drawn LJ reader. Winners will be announced weekly.
Fill out the fields below to enter to win this week's prize-- a Prototyping Pi Plate Kit for Raspberry Pi.
Congratulations to our winners so far:
- 5-8-13, Pi Starter Pack: Jack Davis
- 5-15-13, Pi Model B 512MB RAM: Patrick Dunn
- Next winner announced on 5-21-13!
Free Webinar: Linux Backup and Recovery
Most companies incorporate backup procedures for critical data, which can be restored quickly if a loss occurs. However, fewer companies are prepared for catastrophic system failures, in which they lose all data, the entire operating system, applications, settings, patches and more, reducing their system(s) to “bare metal.” After all, before data can be restored to a system, there must be a system to restore it to.
In this one hour webinar, learn how to enhance your existing backup strategies for better disaster recovery preparedness using Storix System Backup Administrator (SBAdmin), a highly flexible bare-metal recovery solution for UNIX and Linux systems.



Comments
Reply to comment | Linux Journal
If some one wants to be updated with most recent technologies then he must be go to see this web site and be
up to date all the time.