Chemistry the Gromacs Way

In this article, I'm diving into chemistry again. Many packages, both commercial and open source, are available to make chemistry calculations at the quantum level. The one I cover here is gromacs (http://www.gromacs.org). It should be available for your distribution via its package manager.

The gromacs package is actually a full complement of small applications to take care of all of the steps from creating the initialization files, to doing the computational run, to doing analysis and visualization of the results. You also may notice more than one version available. For example, Ubuntu has both a single-processor version and an MPI version to do the calculations on a parallel machine. If you have a multicore machine or a cluster of machines at your disposal, this can help speed up your calculations quite a bit.

Before I start, here's a little background as to what types of calculations are possible and what methods gromacs uses. Ideally in computational chemistry, you should be able to take the Schrodinger's equations for the system in question and solve them completely. But, outside of very simple systems of only a few atoms, doing so becomes impossible rather quickly. This means some sort of approximation is required to get a result. In gromacs, the approximation method is molecular dynamics (MD). MD simulations solve Newton's equation for a set of interacting particles to get their trajectories. The total force on a particle from all of the other particles is calculated and then divided by the mass of the particle to get its acceleration. This is calculated for all of the particles, and every one moves according to its acceleration. Then, time is stepped one unit, and the whole thing is calculated again. You can increase the spatial and temporal resolution, at the cost of more computer time, until you get results that are accurate enough for your purposes.

MD simulations have several limitations. The first is that it is a classical simulation. In most cases, this is fine, but there are situations when you need to know the quantum effects of the atoms in your system. The second limitation is that electrons are in their ground state and are treated as if they move instantly to remain in orbit around their atoms. This means you can't model chemical reactions or any other electronic interactions. The next limitation is that long-range interactions are cut off. This becomes a serious issue when dealing with charged particles. The last limitation that I look at here is the fact that periodic boundary conditions are being used to try to simulate bulk systems. This, combined with the cut-off mentioned above, means you can end up with some unphysical results, especially during long runs.

Now that you have some background, how do you actually use gromacs? You need to start with the inputs: the initial position of all the atoms, the initial velocities and the interaction potential describing the forces between all of the atoms. The center of mass of the entire system is usually defined as having zero velocity, meaning there are no external forces on the system. Once this initial data is entered, gromacs needs to calculate the forces, apply these forces to each particle and calculate their new positions. Once they have moved, gromacs needs to recalculate the forces. This is done using the leapfrog algorithm. To get a better feel for what this looks like, let's consider a concrete example: a protein in water.

The first step is to come up with the initialization files needed to do your calculation. This can be done completely from scratch, but in many cases, it isn't necessary. In the case of proteins, there is the Protein Data Bank (PDB), located at http://www.pdb.org. Make sure the protein structure has the required detail you are looking for in your simulation. You also can load it up in PyMOL and see what it looks like. When you are happy with your selection, you can use it to generate the required initialization files for gromacs with pdb2gmx:


pdb2gmx -f my_protein.pdb -water tip3p

where tip3p is one of the water models that you can select. This command generates several output files, the most important of which are conf.gro, topol.top and posre.itp. At this point, you still haven't added the water to the initialization files. To do so, you first need to define a box to hold the water and the protein. To do that, you would edit the configuration files with the following:


editconf -f conf.gro -bt dodecahedron -d 0.5 -o box.gro

This defines a box with the shape of a dodecahedron and a diameter of 0.5nm. You also can use boxes with cubic or octahedron shapes. Now that you have a box, you can add the water with the command:


genbox -cp box.gro -cs spc216.gro -p topol.top -o solvated.gro

This command takes the box (box.gro) and fills it with water molecules as defined in the file spc216.gro. The -p topol.top option adds this water to the topology of the system. Finally, all of this is written out to the file solvated.gro.

If you tried to run it now, you probably would run into issues because of large forces caused by the introduced water molecules. To deal with that, you can minimize the energy of the system before actually doing any calculations. You can do this by creating a parameter file to define how to do the minimization. For example:


------em.mdp------
integrator   = steep
nsteps       = 200
nstlist      = 10
rlist        = 1.0
coulombtype  = pme
rcoulomb     = 1.0
vdw-type     = cut-off
rvdw         = 1.0
nstenergy    = 10
------------------

In this example, the minimization is being done by steepest-descent, over 200 steps. You can look up the details of all the other options in the gromacs documentation. With this finished, you can do all of the necessary pre-processing with the grompp command:


grompp -f em.mdp -p topol.top -c solvated.gro -o em.tpr

The actual minimization is handled through:


mdrun -v -deffnm em

The prefix em is used for all of the relevant filenames, with different extensions. This makes it easier to do all of the pre-processing and get the initialization steps completed on your desktop, then doing the actual run on a supercomputer.

When you are ready to do the final run, you need to set up a parameter file describing the details. In this example, you could use something like this:


------run.mdp------
integrator   = md
nsteps       = 5000
dt           = 0.002
nstlist      = 10
rlist        = 1.0
coulombtype  = pme
rcoulomb     = 1.0
vdw-type     = cut-off
rvdw         = 1.0
tcoupl       = Berendsen
tc-grps      = protein non-protein
tau-t        = 0.1 0.1
ref-t        = 298 298
nstxout      = 1000
nstvout      = 1000
nstxtcout    = 100
nstenergy    = 100
------------------

With this parameter file, you can do the pre-processing with:


grompp -f run.mdp -p topol.top -c pr.gro -o run.tpr

The actual MD calculation is done with the command:


mdrun -v -deffnm run

Now you can go off and drink a few days' worth of coffee. The actual runtime will depend on how many processors you have to throw at the problem.

Once this run is completed, you still need to analyze it to see what you can learn from it. You can compare the results to experimental results from X-ray structure measurements. You can measure the displacement of the heavy atoms from the X-ray structure with the g_rms program. You can analyze distance and hydrogen bonds with the g_dist and g_hbond programs. You can even make a movie with the trjconv program:


trjconv -s run.gro -f run.xtc -e 2500.0 -o movie.pdb

This will export 2.5ns to a movie of the trajectories. You then can view it using PyMOL.

This short article has provided only the faintest taste of what gromacs can do. I hope it sparks your interest in reading up on it and doing some new and interesting work.

Chemistry image via Shutterstock.com.

Load Disqus comments